Open Tortar opened 11 months ago
Your examples rely on user packages, so I don't see why would you think this is a Julia bug. Apart from that, it's not even clear what behavior you consider buggy.
Are you aware that languages that use garbage collection (except simple forms like reference counting, like with C++, for example) don't provide guarantees about when some pointer will be freed, or when the finalizer of some object will be ran. The precise moment is an implementation detail. Wikipedia, furthermore, says:
Notably, both Java and Python do not guarantee that finalizers will ever be called, and thus they cannot be relied on for cleanup.
I changed it to one not relying on user packages. Thanks for the info about the garbage collection in other languages, I had the impression that a GC.gc()
call (or multiple ones) would have guaranteed to free the unused memory. This could be nonetheless useful sometimes, but I can't judge how much of a hard task is it to make this an option, probably a lot
Then given that it is probably like so for good reasons, maybe changing the docstring of GC.gc() could be a good idea? For now it is:
Perform garbage collection. The argument full determines the kind of collection: A full collection (default) sweeps
all objects, which makes the next GC scan much slower, while an incremental collection may only sweep so-called
young objects.
maybe saying that it is a suggestion is better?
Suggest to perform garbage collection. The argument full determines the kind of collection: A full collection (default) sweeps
all objects, which makes the next GC scan much slower, while an incremental collection may only sweep so-called
young objects.
See also my long reply on discourse on using Bumper.jl that WILL guarantee deallocation. GC languages in general can't do it, but in exception cases it could be done, with the help of the compiler, as I explained. I think Julia should actually try to to that, or even guarantee in all possible cases, as Mojo does.
Julia uses a precise GC together with a GC shadow stack.
julia> function testme()
X = @noinline rand(1_000_000_00)
Y = @noinline sum(X)
X = nothing
GC.gc()
return Y
end
Let's take a look at the IR generated:
julia> @code_llvm testme()
; @ REPL[7]:1 within `testme`
define double @julia_testme_190() #0 {
top:
%gcframe2 = alloca [3 x {}*], align 16
%gcframe2.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 0
%0 = bitcast [3 x {}*]* %gcframe2 to i8*
call void @llvm.memset.p0i8.i64(i8* align 16 %0, i8 0, i64 24, i1 true)
%thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #8
%tls_ppgcstack = getelementptr i8, i8* %thread_ptr, i64 -8
%1 = bitcast i8* %tls_ppgcstack to {}****
%tls_pgcstack = load {}***, {}**** %1, align 8
; @ REPL[7]:2 within `testme`
%2 = bitcast [3 x {}*]* %gcframe2 to i64*
store i64 4, i64* %2, align 16
%3 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 1
%4 = bitcast {}** %3 to {}***
%5 = load {}**, {}*** %tls_pgcstack, align 8
store {}** %5, {}*** %4, align 8
%6 = bitcast {}*** %tls_pgcstack to {}***
store {}** %gcframe2.sub, {}*** %6, align 8
%7 = call nonnull {}* @j_rand_192(i64 signext 100000000)
%8 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 2
store {}* %7, {}** %8, align 16
; @ REPL[7]:3 within `testme`
%9 = call double @j_sum_193({}* nonnull %7)
; @ REPL[7]:5 within `testme`
; ┌ @ gcutils.jl:129 within `gc` @ gcutils.jl:129
call void inttoptr (i64 140443355088864 to void (i32)*)(i32 1)
%10 = load {}*, {}** %3, align 8
%11 = bitcast {}*** %tls_pgcstack to {}**
store {}* %10, {}** %11, align 8
; └
; @ REPL[7]:6 within `testme`
ret double %9
}
%gcframe2 = alloca [3 x {}*], align 16
Is the creation of the shadow stack. Simplifying a bit every allocated object inside a function get's assinged a slot in this gcframe and GC scans the data structure (which forms a linked list) to find objects that are used.
%7 = call nonnull {}* @j_rand_192(i64 signext 100000000)
%8 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 2
store {}* %7, {}** %8, align 16
Now this is the call to rand
and store of X
to the gcframe, so that it is rooted
for the next step the call to sum
.
%9 = call double @j_sum_193({}* nonnull %7)
Now immediately after we call GC.gc
┌ @ gcutils.jl:129 within `gc` @ gcutils.jl:129
call void inttoptr (i64 140443355088864 to void (i32)*)(i32 1)
But X
is still rooted on the gcstack. Once we exit the function we pop the gcframe from the pgcstack and X
can be freed.
There might be an open issue flying around, but if I recall correctly we are currently missing the step where we zero out the GC slot when the lifetime of the object ends, since that is often not necessary to do (e.g. if you exit the function the memory will be freed).
Probably the place to change is around https://github.com/JuliaLang/julia/blob/3c4bccd93b2b7d9954ebd3ab43268a3e13185775/src/llvm-late-gc-lowering.cpp#L2620
I think you're saying GC is called. But is GC guaranteed to free large objects, even for full (the default for GC.gc()? I mean is it promised or a known limitation. Should is be promised, at least for full GC? This can be avoided with Bumper.jl, i.e. it eliminates the problem as opt-in. Julia should maybe add Bumber.jl as stdlib, similar to Mojo. I.e. a compile time guarantee. But until there, there is a large and small distinction for objects in the GC, right? Just pointing out in case it explains the (perceived) problem.
I see no reason why Bumper is part of this discussion?
Julia's GC is precise and generational.
Now the challenge is what is "garbage". Garbage is all objects that are detached from the object-graph and are not in one of the root-sets.
Due to Julia being precise the root-sets include "all objects that are used within a function".
So when we see behavior that surprises us like the original post the question we need to answer: Why is this object live? It isn't referenced anywhere so the only remaining bit is that it must be part of root-set.
My hope was to demystify Julia a bit by walking through the reason why X
is live until the end of the function, which is a wider lifetime than expected. There is a simple technical reason (e.g. we didn't think it was worth it) and maybe we should revisit that.
So yes the GC is guaranteed to free objects, but it needs to see them as garbage.
So when we see behavior that surprises us like the original post the question we need to answer: Why is this object live?
The topic here is why GC.gc() is "unable to free", and as soon as "X = nothing" happens in the code there, X is no longer live, so it's a good question (and should be answered), and yes this should get fixed. But I mean X could be freed right there, with Libc.free implicitly, before the next line, or any other (with allocation) triggering it. The GC without help couldn't. My understanding is that is what Mojo does, why I brought up Bumper. I believe it's similar, just explicit, not implicit (and doesn't use the heap; uses its own stack for allocated objects, I'm not sure Mojo does the same or works for the heap, maybe both).
The GC could free that object, the issue is that the generated code mark that object as dead until the end of the function, so when GC.gc()
runs it still thinks the array is live.
@gbaraldi just to check my understanding, you are saying because X
is unused after the call to sum()
the compiler determines that it can optimise away X = nothing
(its not in the IR @vchuravy posted) because there is no use of X
after that point.
But of course GC.gc()
in a sense "uses" everything, but that information is probably not available to the compiler.
I opened some days ago this issue here: https://github.com/apache/arrow-julia/issues/492. But after experimenting quite a bit I think this is a GC issue.
MWE (which I runned in REPL mode) similar as the one reported here https://discourse.julialang.org/t/how-to-force-an-object-to-be-freed-by-the-garbage-collector/77985:
the memory occupied by
X
is not freed, if I put after the call totester
a GC call instead it is.Version Info: