JuliaParallel / Dagger.jl

A framework for out-of-core and parallel execution
Other
629 stars 67 forks source link

Memory leak when copying data from one worker to another #547

Open RomeoV opened 2 months ago

RomeoV commented 2 months ago

Allocating matrices on one worker and copying them to another worker repeatedly leads to a memory leak on my computer, and the Julia session being killed.

julia> using Distributed
julia> addprocs(8)
julia> using Dagger
julia> for _ in 1:5
       @time fetch(let
         foo = Dagger.@spawn scope=Dagger.scope(worker=1) rand(10000, 10000);
         Dagger.@spawn scope=Dagger.scope(worker=2) copy(foo)
       end);
       end

The foo matrix and its copy should be garbage collected, which I don't think they are. But even then, each matrix is 0.8GB, so if they exist 5 times on both workers we have 5 2 0.8GB=8GB of memory, which should not overflow my RAM. (I have at least 16GB free).

Session is a clean temp project with Dagger v0.18.12, Julia 1.10.4,

jpsamaroo commented 1 month ago

Sorry for the slow reply - this is probably a known memory leak, also reported offline by @mofeing in a similar case. I'll investigate and see if I can resolve it.

jpsamaroo commented 1 month ago

Using the example above, I've found the initial source of retained memory, and am fixing it in https://github.com/JuliaParallel/Dagger.jl/pull/558 (branch is very WIP, expect it to not work right now). I'll close this issue once that PR is merged, which should fully address this.