Optimize garbage collection for processes with large heaps

rickard-green commented 2 years ago

The general design recommendation is to avoid processes with large amounts of data since garbage collection may take a long time and will consume lots of memory. This can be improved, but has not been prioritized yet. I open this issue so we have an issue to point to when this comes up, and to gather thoughts and ideas about how to improve this.

Some initial thoughts about this:

Currently the whole garbage collection is performed at once which negatively impacts the responsiveness of processes with large heaps being garbage collected. In order to maintain responsiveness, the garbage collection needs to be performed incrementally. With an incremental garbage collector, the need for garbage collection of such processes on dirty schedulers would go away. A yielding garbage collector would also make it possible to move garbage collection from dirty to normal schedulers, but it would make responsiveness even worse than with the current strategy.

Current copying garbage collection utilize much memory especially during the collection since the new heap to copy live data to has to be sized in a size similar to the already existing heap. That is, the size during collection more or less double. This strategy would become much more problematic when doing an incremental (and/or yielding) garbage collection since the amount of processes in the system that garbage collect simultaneously would not be limited as it is today (amount of schedulers on the system). The heap areas should preferably be fragmented into multiple smaller blocks which can be allocated/deallocated independently during garbage collection and normal execution. The use of (mainly) same sized heap fragments can also improve memory fragmentation issues in memory allocators.

Current garbage collection both size the new old heap block and the new new heap block with a lot of free memory in order to have room for new data before triggering next garbage collection. This cause processes with large heaps to become extra large. Currently the new heap can be fragmented (which it actually is). We could to some extent utilize this and not allocate such a big new new heap when garbage collecting and grow in new fragments instead. This will however not decrease the peek memory utilization of the process since usage will monotonically increase until we garbage collect. We could of course garbage collect earlier in order to reduce peek utilization, but that would likely very often have a negative impact on performance. It is currently not possible to utilize a fragmented old heap since we have no efficient way to distinguish data on a fragmented old heap from data on a fragmented new heap (and data in literal areas). This could on at least some (major) platforms be solved by allocating old heap fragments in a separate range (just like the literal data range used).

garazdawi commented 2 years ago

Last time I looked at a different garbage collection algorithm for large heaps I started to really like the MC2 Garbage Collector. With a few tweaks, I think that it would work great for large Erlang heaps.

chasers commented 1 month ago

What is "large" exactly here?

garazdawi commented 3 weeks ago

It depends on which of the two problems mentioned in the description the main focus is.

If it is GC latency, then that would be heaps that are large enough for the current GC to take more than a couple of milliseconds. But this should probably be a configuration parameter with a good enough default as GC thoughput will likely suffer if latency is improved.

If it is peak memory usage, then it very much depends on the system but should probably start at a couple of 10s of MBs of not 100s of MBs. Again, this should probably be configurable as some embedded systems may consider 1 MB a large heap, while big server systems use GBs of data.

erlang / otp

Optimize garbage collection for processes with large heaps #5396