Closed wsmoses closed 4 months ago
What does an invalid gc_preserve
look like at the AST level?
https://github.com/JuliaSIMD/Polyester.jl/blob/edb49726e612cfba3af5937b5e0cea041760a2d3/src/batch.jl#L266-L278
The structure looks simple:
julia> @macroexpand GC.@preserve a b c begin
foo(a,b,c)
end
:($(Expr(:gc_preserve, quote
#= REPL[1]:2 =#
foo(a, b, c)
end, :a, :b, :c)))
julia> dump(ans)
Expr
head: Symbol gc_preserve
args: Array{Any}((4,))
1: Expr
head: Symbol block
args: Array{Any}((2,))
1: LineNumberNode
line: Int64 2
file: Symbol REPL[1]
2: Expr
head: Symbol call
args: Array{Any}((4,))
1: Symbol foo
2: Symbol a
3: Symbol b
4: Symbol c
2: Symbol a
3: Symbol b
4: Symbol c
All it does is produce Expr(:gc_preserve, block, preserved_symbols...)
, which looks like what the macro GC.@preserve
does.
It produces the expression manually to support vararg arguments, and to avoid heap allocations that Julia seemed to occasionally incur from constructing a tuple and preserving that instead. We don't really need/want to create an explicit tuple stack object anyway, if we don't have to.
If the produced AST is valid, then it is a Julia compiler problem.
Julia has been known to miscompile GC.@preserve
.
Polyester itself does not do anything lower level than building Julia ASTs for macros and @generated
functions. It certainly doesn't do LLVM code generation of place tokens. The AST it emits is quite explicit about what the scope of the preserve should be, and thus where the gc_preserve_begin
and gc_preserve_end
should go.
The expression Polyester creates does end with a return
, making this issue look like https://github.com/JuliaLang/julia/issues/45901
Which version of Julia are you on? I tried 1.10.3, 1.11-beta1, and master. In none did I see a @llvm.julia.gc_preserve_begin
token.
It's probably not JuliaLang/julia#45901, but I removed the useless return statement anyway in https://github.com/JuliaSIMD/Polyester.jl/pull/146
I did not notice any change in generated IR. There shouldn't be any change, so that's to be expected. But that means this probably isn't a workaround, even though it did work around the previous issue.
I just checked using Cthulhu.
Polyester's latest release:
L740: ; preds = %L704
br label %L741
L741: ; preds = %L740, %L702
%541 = call nonnull {}* @julia.pointer_from_objref({}* inttoptr (i64 140059852555104 to {}*)) #6
%bitcast_coercion258 = ptrtoint {}* %541 to i64
%542 = call i64 @"japi1___apply_bias_activation!!_1857u1870"(i64 %bitcast_coercion258, i64 %value_phi164)
br label %L746
vs #146:
L740: ; preds = %L704
br label %L741
L741: ; preds = %L740, %L702
%541 = call nonnull {}* @julia.pointer_from_objref({}* inttoptr (i64 140322931884896 to {}*)) #6
%bitcast_coercion258 = ptrtoint {}* %541 to i64
%542 = call i64 @"japi1___apply_bias_activation!!_1627u1640"(i64 %bitcast_coercion258, i64 %value_phi164)
call void @llvm.julia.gc_preserve_end(token %334)
br label %L747
Both have
%334 = call token (...) @llvm.julia.gc_preserve_begin({}* %15, [2 x {}*] %314)
br label %L350
I am closing this here as we have now worked around the issue, but please open a Julia issue because this was never a Polyester bug. It is a Julia bug.
cc @vchuravy @ChrisRackauckas @avik-pal
Polyester emits invalid GC info -- potentially leading to very serious problems downstream. In enzyme it results in a crash, and I can potentially imagine an infinite memory leak or segmentation fault occuring as a result.
You will see that there is a gc preserve begin
which is never used and has no gc_preserve_end
polyester.ll.txt