Closed tanmaykm closed 1 year ago
The below code that adds some code to regulate GC behavior can recreate the situation.
julia> using Distributed
julia> addprocs(4);
julia> @everywhere begin
using Distributed, DistributedArrays
end
julia> A = rand(1:100, (100,100));
julia> DA = distribute(A);
julia> GC.enable(false)
true
julia> DA = nothing
julia> function DistributedArrays.d_from_weakref_or_d(id)
d = get(DistributedArrays.registry, id, nothing)
GC.enable(true)
GC.gc()
isa(d, WeakRef) && return d.value
return d
end
julia> DistributedArrays.d_closeall()
ERROR: MethodError: no method matching close(::Nothing)
Closest candidates are:
close(::Union{Base.AsyncCondition, Timer})
@ Base asyncevent.jl:162
close(::Union{FileWatching.FileMonitor, FileWatching.FolderMonitor, FileWatching.PollingFileWatcher})
@ FileWatching /data/Work/julia/binaries/julia-1.9.3/share/julia/stdlib/v1.9/FileWatching/src/FileWatching.jl:328
close(::DArray)
@ DistributedArrays ~/.julia/dev/DistributedArrays/src/core.jl:34
...
Stacktrace:
[1] d_closeall()
@ DistributedArrays ~/.julia/dev/DistributedArrays/src/core.jl:47
[2] top-level scope
@ REPL[9]:1
Can we be sure that the remote memory is properly freed if the reference has been garbage collected?
Yes, I think so, because the finalizer invokes close
. Did a small test to confirm that:
julia> using Distributed
julia> addprocs(4);
julia> @everywhere begin
using Distributed, DistributedArrays
function print_registry_entries()
for k in keys(DistributedArrays.registry)
hasval = !(DistributedArrays.d_from_weakref_or_d(k) === nothing)
println("key $k hasval $hasval")
end
end
end
julia> @everywhere print_registry_entries()
julia> A = rand(1:100, (100,100));
julia> DA = distribute(A);
julia> GC.enable(false)
true
julia> DA = nothing
julia> @everywhere print_registry_entries()
key (1, 1) hasval true
From worker 5: key (1, 1) hasval true
From worker 3: key (1, 1) hasval true
From worker 2: key (1, 1) hasval true
From worker 4: key (1, 1) hasval true
julia> function DistributedArrays.d_from_weakref_or_d(id)
d = get(DistributedArrays.registry, id, nothing)
GC.enable(true)
GC.gc()
isa(d, WeakRef) && return d.value
return d
end
julia> @everywhere print_registry_entries()
key (1, 1) hasval false
From worker 2: key (1, 1) hasval true
From worker 3: key (1, 1) hasval true
From worker 5: key (1, 1) hasval true
From worker 4: key (1, 1) hasval true
julia> DistributedArrays.d_closeall()
julia> @everywhere print_registry_entries()
Would it be good to add this as a test?
It seems possible that
DistributedArrays.d_closeall()
may encounter a condition where it finds a darray id in theregistry
, but the corresponding weakref value isnothing
because the referenced darray got garbage collected. It has been enountered many times in CI and elsewhere, but is hard to replicate normally. Adding a check for the weakref value, before actually invokingclose
on it, to fix it.fixes #246