Open oxinabox opened 3 years ago
I'm going to say this is fixed by #44215. Of course, please post more examples if there are still bad cases.
Specifically, if you have examples, https://github.com/JuliaCI/GCBenchmarks is collecting them.
I was trying this again today, and my machine apparently just decided to run out of memory instead of ever running GC (over 30GB heap)
What julia version?
For a related example, I paused the task for a bit inside GC to print the stats, and when I resumed it, it decided that it must now wait for 10GB of new allocations before being eligible for the next GC. I am a little confused how it thinks only 67MB are mapped right now also?
(lldb) p gc_heap_stats
(gc_heapstatus_t) $1 = (bytes_mapped = 67_125_248, bytes_resident = 67_125_248, heap_size = 7_192_808_358, heap_target = 17_053_423_782)
I didn't grab stats for the first run, but I noticed on later runs that after the first 5 GB are allocated, usually the next GC is scheduled for after the heap reaches 10-20 GB.
julia> versioninfo()
Julia Version 1.11.0-DEV.165
Commit 76a9772c13* (2023-07-24 15:47 UTC)
Ooops. I'll take a look
Julia has a generational mark and sweep GC. One of the properties of a mark and sweep GC is that it needs to go through and mark every reference object in memory to decide if it should be swept whenever the GC runs. This is very expensive if you have a lot of objects in memory, and the cost is incurred even if those objects are not being freed (or even read). Which is where the generational aspect comes in. It applies a heuristic to break the objects in memory into generations. Things that have been referenced for a long time (in some sense) are likely to continue to be referenced, and so are moved into the older generations. The generational GC should mark and sweep the young generations very often, and the older generations more rarely. This should mean if you have some large object with lots of references that are e.g. allocated at the start of your program and kept until the end of the program, it will make the GC slow at first since it is so large and takes so long to mark, but after a little while it will move into the older generations and very rarely be marked, and so cease to have an effect on the time it takes to do normal GCs.
The following examples use
ZonedDateTimes
fromTimeZones@1.5.3
which are for now notisbits
though that is being worked on (https://github.com/JuliaTime/TimeZones.jl/issues/271). You can create a similar example that uses any non-isbits
objects such as mutable objects, things with abstract typed fields, or things with non-isbits fields (such asString
or anyVector
). This one I had handy.You can see in this first example the generational GC working. At first, after the
zdts
are allocated there is a sharp jump in the time spent on GC, up to around 90%. But then after a few rounds, thezdts
have moved into the older generation, and the time shifts spent on GC goes back down nicely to what it was at the start: around 20%However, it's very easy to run a different function that does not display the generational behavior
burn()
is identical tolittle_burn
but runs for 10x longer and allocates 10x as much memory. Likelittle_burn
it never actually touches thezdts
. withburn
however the GC time always stays at 60-80%. This is due to it triggering a full sweep of both generations of our GC. Thus it keeps needing to mark all of the big and complexzdts
which is slow.I have seen real code that takes tens of seconds to do a full mark and sweep. We really don't want to be running that very often. (if we are running that kind of mark and sweep too often it might even be better to OOM error)
It seems we need to do something to the GC to have it do a full mark and sweep less often. This might be tweaking the heuristics. It might be adding another generation. it might be both