Closed ufechner7 closed 8 months ago
Could you post a small example that leads to the error? This would help a lot in narrowing the issue down
Could you post a small example that leads to the error? This would help a lot in narrowing the issue down
I happens reproducible with my production code, but I am not allowed to share it... So far it did not happen with the smaller code examples I tried, I will continue to try to create an MWE...
Possibly duplicate of #42566, see from here on.
But in #42566 they say that "GC.gc(true); GC.gc()
Does not fix it."
But for me GC.gc()
frees the unreleased memory. So it might be a different issue.
But for me GC.gc() frees the unreleased memory. So it might be a different issue.
Indeed, if manually running GC stops the OOM killer bumping your process off, then the problem is likely not failing to return freed memory to the system, but how GC knows that OOM is approaching and so can work harder to collect unreferenced memory. IIRC there are several Julia issues about that, but of course my search for them is failing just now.
What is --heap-size-hint
you set?
What is
--heap-size-hint
you set?
julia -J bin/kps-image-1.9.so --project -i -q -p 16 --heap-size-hint=1G
And I have 32 G memory.
It would be good to see if this is happening on recent julia nightlies. @gbaraldi's recent GC logic changes should have fixed this.
Just a note that the OOM killer is activated by the total memory of your cgroup IIUC, not just the parent, so would likely include any worker process memory usage as well as the parent process.
Does --heap-size-hint
propagate to the workers?
How big is bin/kps-image-1.9.so
? Or after just starting Julia how much memory does ps aux
say you are using?
--heap-size-hint
is currently not strict, and only measures the live heap and not sysimage/shared libraries etc.
@elextr Does
--heap-size-hint
propagate to the workers?
I also noticed this in a different context. It seems like the interaction between processes and heap-size-hint is not yet defined (?). I posted an issue here: https://github.com/JuliaLang/julia/issues/50673.
@oscardssmith can you link the PRs you mentioned?
How big is
bin/kps-image-1.9.so
? Or after just starting Julia how much memory doesps aux
say you are using?
--heap-size-hint
is currently not strict, and only measures the live heap and not sysimage/shared libraries etc.
ufechner@ufryzen:~$ free -h
total used free shared buff/cache available
Mem: 30Gi 11Gi 12Gi 74Mi 6,7Gi 18Gi
Swap: 1,9Gi 0B 1,9Gi
and in ps aux 16 times:
ufechner 10971 3.5 3.5 2397148 1143096 ? Ssl 08:18 0:07 /home/ufechner/packages/julias/julia-1.9/bin/julia -Cnative -J/home/ufechner/repos/WindTurbines/bin/kps-image-1.9.so -g1 --bind-to 127.0.0.1 --worker
and
ufechner@ufryzen:~/repos/WindTurbines/bin$ ls -lah kps-image-1.9.so
-rwxrwxr-x 1 ufechner ufechner 808M jul 24 11:48 kps-image-1.9.so
We have updated the heuristics more, so the GC should try harder to avoid exceeding this memory limit. We, however, don't control how much memory is required by external libraries (e.g. LLVM) so we expect precompile to take substantial amounts of memory and only possible to do on large build machines as a requirement for building (but not running).
I often see that my code is killed due to out-of-memory. This happens when using pmap, but also when running single threaded single process code that allocates a lot repeatedly from the repl. I tried to add --heap-size-hint, but it did not help.
My workaround: I added the following code to all functions that allocate a lot:
This should not be needed, the garbage collector should do a full collection before the system runs out of memory on its own.