brenhinkeller / StaticTools.jl

Enabling StaticCompiler.jl-based compilation of (some) Julia code to standalone native binaries by avoiding GC allocations and llvmcall-ing all the things!
MIT License
167 stars 12 forks source link

StaticStrings: I'm trying to add an iterate method , how can I fix an undefined reference to `ijl_trhow`? #35

Closed Klafyvel closed 1 year ago

Klafyvel commented 1 year ago

Hi there, I'm starting to play with StaticTools.jl, and it's great. However, I'd like to use methods like startswith (I'm trying to statically compile my package Norg.jl). I decided to try an implement the required iterate method, by basically translating the standard library as follow:

using StaticTools, StaticCompiler
function Base.iterate(s::StaticString, i::Int=firstindex(s))
       ((i % UInt) - 1 < ncodeunits(s) && s[i] ≠ 0x00) || return nothing
           b = @inbounds codeunit(s, i)
           u = UInt32(b) << 24
           between(b, 0x80, 0xf7) || return reinterpret(Char, u), i+1
           return iterate_continued(s, i, u)
           end
@inline between(b::T, lo::T, hi::T) where {T<:Integer} = (lo ≤ b) & (b ≤ hi)
function iterate_continued(s::StaticString, i::Int, u::UInt32)
           u < 0xc0000000 && (i += 1; @goto ret)
           n = ncodeunits(s)
           # first continuation byte
           (i += 1) > n && @goto ret
           @inbounds b = codeunit(s, i)
           b & 0xc0 == 0x80 || @goto ret
           u |= UInt32(b) << 16
           # second continuation byte
           ((i += 1) > n) | (u < 0xe0000000) && @goto ret
           @inbounds b = codeunit(s, i)
           b & 0xc0 == 0x80 || @goto ret
           u |= UInt32(b) << 8
           # third continuation byte
           ((i += 1) > n) | (u < 0xf0000000) && @goto ret
           @inbounds b = codeunit(s, i)
           b & 0xc0 == 0x80 || @goto ret
           u |= UInt32(b); i += 1
       @label ret
           return reinterpret(Char, u), i
       end

And this works fine in the interpreter

julia> startswith(c"foobar", c"foo")
true

But this won't compile:

julia> function test()
           if startswith(c"foobar", c"foo")
           println(c"marche")
           else
           println(c"marche pas")
           end
           end
test (generic function with 1 method)

julia> compile_executable(test, (), "./", filename="test_compile")
/usr/bin/x86_64-linux-gnu-ld : ./test_compile.o : dans la fonction « gpu_gc_pool_alloc » :
text:(.text+0x420) : référence indéfinie vers « ijl_throw »
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)

I guess there is some exception throw somewhere in the code but I have no clue as to how I can find them. Do you have any insight on that?

brenhinkeller commented 1 year ago

Oh wow, you've properly dived in to this! An iterate method for StaticStrings would be great!

So yes, the ijl_throw certainly means that there is something somewhere in the code which can concievably throw an error... and thus can't StaticCompile.. It's not immediately obvious to me from looking at the code what that is, but it must be something!

One thing that might be helpful is diving into your test function with Cthulhu.jl and trying to find where and what any error-handling code is coming from..

I may be able to give it a try later but if you can find where the error gets thrown then you may be set

Klafyvel commented 1 year ago

From what I can see in Cthulhu, there is a small type instability because of the return nothing statement, but removing it doesn't fix the issue. I could not locate the ijl_throw statement, even using @code_llvm.

brenhinkeller commented 1 year ago

Hmm, so for iteration in general we may actually be able to get by with something simpler, e.g.

julia> Base.iterate(s::StaticString) = iterate(s, firstindex(s))

julia> Base.iterate(s::StaticString, i::Int) = i < lastindex(s) ? (s[i], i+1) : nothing

julia> startswith(c"Foobar", c"Foo")
true

julia> contains(c"Foobar", c"Foo")
true

julia> startswith(c"Foobar", c"Foo")
true

julia> function test()
                  if startswith(c"foobar", c"foo")
                  println(c"marche")
                  else
                  println(c"marche pas")
                  end
                  end
test (generic function with 1 method)

julia> test()
marche
0

There is one problem though -- this is type unstable (iterator returns nothing once the index gets too big), and the Base method

function startswith(a::AbstractString, b::AbstractString)
    i, j = iterate(a), iterate(b)
    while true
        j === nothing && return true # ran out of prefix: success!
        i === nothing && return false # ran out of source: failure
        i[1] == j[1] || return false # mismatch: failure
        i, j = iterate(a, i[2]), iterate(b, j[2])
    end
end

explicitly depends on this type instability to work...

brenhinkeller commented 1 year ago

So this will still not compile (though ultimately because of the _gpu_gc_pool_alloc, not the _ijl_throw):

julia> compile_executable(test, (), "./", filename="test_compile")
Undefined symbols for architecture arm64:
  "_ijl_throw", referenced from:
      _gpu_gc_pool_alloc in test_compile.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR: failed process: Process(`cc -e _julia_test ./test_compile.o -o ./test_compile`, ProcessExited(1)) [1]

Stacktrace:
 [1] pipeline_error
   @ ./process.jl:565 [inlined]
 [2] run(::Cmd; wait::Bool)
   @ Base ./process.jl:480
 [3] run
   @ ./process.jl:477 [inlined]
 [4] generate_executable(f::Function, tt::Type, path::String, name::String, filename::String; cflags::Cmd, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ StaticCompiler ~/.julia/packages/StaticCompiler/clvAO/src/StaticCompiler.jl:375
 [5] compile_executable(f::Function, types::Tuple{}, path::String, name::String; filename::String, cflags::Cmd, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ StaticCompiler ~/.julia/packages/StaticCompiler/clvAO/src/StaticCompiler.jl:244
 [6] top-level scope
   @ REPL[31]:1
brenhinkeller commented 1 year ago

However, if we define a fully type-stable and inlined version

julia> using StaticCompiler, StaticTools

julia> Base.iterate(s::StaticString) = iterate(s, firstindex(s))

julia> Base.iterate(s::StaticString, i::Int) = s[i], i+1

julia> function Base.startswith(a::StaticString, b::StaticString)
           i, j = iterate(a), iterate(b)
           while true
               j[2] === lastindex(b) && return true # ran out of prefix: success!
               i[2] === lastindex(a) && return false # ran out of source: failure
               i[1] == j[1] || return false # mismatch: failure
               i, j = iterate(a, i[2]), iterate(b, j[2])
           end
       end

julia> function test()
                  if startswith(c"foobar", c"foo")
                  println(c"marche")
                  else
                  println(c"marche pas")
                  end
                  end
test (generic function with 1 method)

julia> test()
marche
0

julia> @inline function Base.startswith(a::StaticString, b::StaticString)
           i, j = iterate(a), iterate(b)
           while true
               j[2] === lastindex(b) && return true # ran out of prefix: success!
               i[2] === lastindex(a) && return false # ran out of source: failure
               i[1] == j[1] || return false # mismatch: failure
               i, j = iterate(a, i[2]), iterate(b, j[2])
           end
       end

julia> compile_executable(test, (), "./", filename="test_compile")
"/Users/cbkeller/test_compile"

then it looks like we might be in business!

shell> ./test_compile
marche
brenhinkeller commented 1 year ago

~Perhaps interestingly, it seems that Base.contains may work without modification (though may need to check that it's still safe!)~:

julia> function test2()
                  if contains(c"foobar", c"foo")
                  println(c"marche")
                  else
                  println(c"marche pas")
                  end
                  end
test2 (generic function with 1 method)

julia> compile_executable(test2, (), "./", filename="test2_compile")
"/Users/cbkeller/test2_compile"

shell> ./test2_compile
marche

Edit: Oh haha, looks like that's because I already defined it:

    @inline function Base.contains(haystack::AbstractStaticString, needle::AbstractStaticString)
        lₕ, lₙ = length(haystack), length(needle)
        lₕ < lₙ && return false
        for i ∈ 0:(lₕ-lₙ)
            (haystack[1+i:lₙ+i] == needle) && return true
        end
        return false
    end

Anyways, if you want to make a PR to add and test these methods (and/or other related ones) I'd be happy to merge it!

Klafyvel commented 1 year ago

Oh! I didn't think of replacing Base.startswith after making iterate type stable...

There's a funny thing happening there: your code compiles only if Base.startswith is inlined.

I think the complicated iterate method is required to deal with UTF-8, because as is your iterate method behaves weird:

julia> s = c"α"
c"α"

julia> iterate(s)
(0xce, 2)

julia> c = iterate(s)
(0xce, 2)

julia> Char(c[1])
'Î': Unicode U+00CE (category Lu: Letter, uppercase)

So I ended up having something working fine by mixing my original proposal with yours! I'll open a PR for that. (I may also open another later for endswith and other utilities)

tshort commented 1 year ago

The following works with StaticsStrings.jl, and I don't think they define startswith. That suggests that some version of iterate should work.

using StaticCompiler
using StaticStrings
using Libdl

function testfun()
    if startswith(static"foobar", static"foo")
        return 1
    else
        return 0
    end
end

name = repr(testfun)
filepath = compile_shlib(testfun, (), "./", name)

# Open dylib
ptr = Libdl.dlopen(filepath, Libdl.RTLD_LOCAL)
fptr = Libdl.dlsym(ptr, "julia_$name")
ccall(fptr, Int, ())
brenhinkeller commented 1 year ago

If you can make it work and want to PR, I’m open

Klafyvel commented 1 year ago

Getting rid of the special iterate method would be awesome! And I'm very curious to see how they deal with the expected return nothing.