Try a different memory garbage collector: whippet

roberth commented 1 year ago

I want to add the boehm garbage collector is a conservative collector that does not allow heap compaction.

I was hoping to spark some interest in assessing the mark-region algorithm as a possible new garbage collection algorithm for nix because it allows for heap compaction. There are some existing implementations in rust (immix) and c (whippet). In particular the whippet implementation seems relevant to nix because it has zero dependencies and has boehm-compatible api.

Originally posted by @jsoo1 in https://github.com/NixOS/nix/issues/8621#issuecomment-1615952672

roberth commented 1 year ago

I was planning on setting aside some time for it if there seemed to be interest from the team.

There's interest, certainly on my end, and I hope I can be of help. Whether we can adopt this change will depend on various factors, such as

How much code change is needed?
Similarly, is it feasible to support both GCs? That would reduce maintenance risk for us.
What is the improvement in peak memory?
Does it improve or degrade CPU time taken?
Might the coroutine integration improve? (tricky!)

Currently the coroutine integration is a little hacky. Coroutines aren't officially supported by Boehm GC yet, and they way I've integrated them isn't pretty; it's not capable of using the correct stack pointer making it overly conservative. This only works on linux. On darwin we disable GC when any coroutine exists. For a measurement, the "darwin" solution will be sufficient, as we don't normally have long lived coroutines (or at least that's the assumption).

jsoo1 commented 1 year ago

All good questions. I'll have to do some experimentation and find out!

edolstra commented 1 year ago

Refcounted smart pointers don't work because there are numerous cycles in Value / Env objects. They're feasible for Expr objects, but those are not the main source of memory consumption and need to be retained for most of the execution of the program anyway, hence why I never bothered with proper memory management for them.

I implemented a precise collector for Nix once (https://github.com/NixOS/nix/compare/master...precise-gc). It requires all roots on the stack/heap to be marked explicitly. This would avoid the Boehm problem of keeping large graphs of values alive because of misidentified pointers. However, it's unclear how much Nix's memory problems are due to this. Probably, Nix's evaluation model in combination with Nixpkgs is just inherently wasteful (e.g. attributes like .override make it impossible to GC the inputs to a package function), so no alternative garbage collector is going to make a substantial improvement.

roberth commented 1 year ago

How about just DO NOT use any GC

Lazy functional languages tend to produce cycles, so the "least GC" you can get, short of just giving up on ever releasing cyclic structures, is reference counting with cycle detection. Cycle detection is still a kind of GC, but a GC that gets to run a little less often.

the GC is completely non-thread-safe

Boehm GC is actually thread safe. Or do you mean that it needs to stop all threads? That is a valid strategy for GC, and I believe it is implemented / "configured" correctly in Nix.

And users of libexpr now must fork processes to concurrently eval things.

The evaluator never promised safe evaluation by multiple threads, regardless of whether GC is enabled.

Implementing an efficient runtime system for a lazy functional language is non-trivial and a mere reply is not going to do that any justice. There's vast research into this topic that you'll need to read up on before you can make a realistic suggestion about how to improve this, and even then it's going to need to be measured and not be guesswork.

roberth commented 1 year ago

This discussion got a bit off topic. None of what's discussed just now invalidates @jsoo1's proposal to get a potential incremental improvement from a different GC implementation. If we want to delve into other ideas to improve memory management, please open a new issue.

wingo commented 1 month ago

Hey whippet author here! I think whippet's target use case is pretty compatible with Nix's. And actually I hope that by replacing the GC in Guile with Whippet, Guix will get some memory improvements as well, so it's really quite aligned I think.

Limitations currently:

whippet is essentially only tested on gcc and gnu/linux at the moment, I will need to put in some porting work to get it working on MacOS and Windows, and clang/llvm is strict about needing _Atomic to every operand of a C11 atomic op, which I need to fix. Shouldn't be too much work here tho
performance of the mostly-marking collector is good, generally better than bdw at a given heap size, and handles parallel mutators better; but ymmv

I am about to start getting Whippet into Guile, which will be the real proof that it's an improvement. Plan is to first convert Guile to use Whippet but with BDW implementing the Whippet API, then try to switch to the mostly-marking collector. Probably will take 6 months or so. If someone wants to have a look at it, happy to chat some time; wingo at igalia dot com.

NixOS / nix

Try a different memory garbage collector: whippet #8626