brenhinkeller / StaticTools.jl

Enabling StaticCompiler.jl-based compilation of (some) Julia code to standalone native binaries by avoiding GC allocations and llvmcall-ing all the things!
MIT License
167 stars 11 forks source link

Linker error for one of the examples, not others #49

Open PallHaraldsson opened 1 year ago

PallHaraldsson commented 1 year ago
function times_table(argc::Int, argv::Ptr{Ptr{UInt8}})
[..]
           free(M)  # Failing line, without it works, EDIT: see my comment below, on actual more likely cause
       end
times_table (generic function with 1 method)

julia> filepath = compile_executable(times_table, (Int64, Ptr{Ptr{UInt8}}), "./")

filepath = compile_executable(times_table, (Int64, Ptr{Ptr{UInt8}}), "./")
/usr/bin/x86_64-linux-gnu-ld: ./times_table.o: in function `julia_times_table':
start:(.text+0x566): undefined reference to `ijl_apply_generic'
/usr/bin/x86_64-linux-gnu-ld: ./times_table.o: in function `gpu_gc_pool_alloc':
start:(.text+0x5b0): undefined reference to `ijl_throw'
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR: failed process: [..]

I get same error for other code I'm working on (and did likely work at some point), so I checked examples here to see if an install problem. All examples in other repo work, and all here up to this one.

brenhinkeller commented 1 year ago

Hmm, that’s odd — it seems to be working in the unit tests? https://github.com/brenhinkeller/StaticTools.jl/actions/runs/4973322569/jobs/8899099933

PallHaraldsson commented 1 year ago

All the test pass, also for (StaticCompiler), and thus I believe this is intentional:

5-byte StringView: "Hello"ERROR: This is a test [..] Attempting to represent input as UInt64: Inexact conversion [ Info: maybe_throw: task failed sucessfully! Test Summary: | Pass Total Time StaticCompiler | 61 61 3m07.2s

I'm no Linux Mint if it matters, did I need to install any dependencies (other than for Ubuntu/Debian)?

PallHaraldsson commented 1 year ago

I debugged this and the cause is complex.

A. If I cut out lines to end at cols = argparse(Int64, argv, 3) then I need to add:

Int32(1) # Why, i.e. why isn't Int64 returned to col?

to silence return type error.

B. If I leave out M = reinterpret(Int32, M) line only then things work, so that seems to be the cause, but if I keep it I get away with it only, if I leave out the last free(M) line.

brenhinkeller commented 1 year ago

Oh, that’s because the return type of the function has to be stable to be static-compilable, and all the functions that call the C stdlib (i.e., printf, free, etc.) return Int32 (i.e. C int) by convention

PallHaraldsson commented 1 year ago

I'll look at it more, but that's not the issue I think. I believe A. wasn't the problem, just what I discovered along the way, B. reinterpret line is.

I already sped up a pi benchmark 5x, but I can't submit since timing relies on StaticTools... and without it the startup-overhead is too great. You're welcome to see the code here, that I've butchered up, it now compiles (but gets wrong value, strangely no longer compiles if I uncomment latter loop):

function f(rounds)
    pi = 1.0
    x  = -1.0
    r2 = rounds + 2
    vend = r2 - r2 % 8
#    @simd for i in 2*2:8*2:(r2*2)
    for i in 2*2:8*2:(r2*2)
    # Common denominators method, half as many divisions:
        pi += ### Float64(
#=
               -2.0f0 / fma(i, i, -1.0f0) +
               # x / (2.0 * i + 1.0) +
               -2.0f0 / (fma(i, i, 15.0f0) + 8f0i)
               # x / (2.0 * i + 5.0)
               -2.0f0 / (fma(i, i, 63f0) + 16f0i)
               # x / (2.0 * i + 9.0) +
##               -2.0f0 / (fma(i, i, 143f0) + 24f0i)
=#

-2.0f0 / fma(i, i, -1.0) # / (Float64(i)*Float64(i)+143f0 + 24f0i)

## This doesn't work instead because of fma and/or casting: 2.0f0 / (fma(i, i, 143f0) + 24f0i)

               # x / (2.0 * i + 13.0)
###        )
    end
#=
    for i in vend+1:r2
        pi += x / (2.0 * (i + 0.0) - 1.0)
        x = -x
    end
=#
    return pi*4
end

function mainjl()
#function mainjl(argc::Int, argv::Ptr{Ptr{UInt8}})
    fp = fopen(m"rounds.txt", m"r")
    buf = MallocString(undef, 16)
    StaticTools.fread!(buf, fp)

free(buf)

    fclose(fp)
rounds = 10 #    rounds = parse(Int64, buf)
#    free(buf)

res = 3.14
    #fma(3.14f0, res, res)
    res = f(rounds)
    printf(c"%.15f", res)
    return 0
#    return Int32(0)
end

If this line: -2.0f0 / fma(i, i, -1.0) is changed to use 1.0f0 as it should then the compile fails. Also @simd for the look worked at some point (unless if was changed to a constant?), and now it doesn't work. It's critical for this benchmark that it works, so good to know if it should work for sure.