Open roberth opened 1 year ago
I was planning on setting aside some time for it if there seemed to be interest from the team.
There's interest, certainly on my end, and I hope I can be of help. Whether we can adopt this change will depend on various factors, such as
Currently the coroutine integration is a little hacky. Coroutines aren't officially supported by Boehm GC yet, and they way I've integrated them isn't pretty; it's not capable of using the correct stack pointer making it overly conservative. This only works on linux. On darwin we disable GC when any coroutine exists. For a measurement, the "darwin" solution will be sufficient, as we don't normally have long lived coroutines (or at least that's the assumption).
All good questions. I'll have to do some experimentation and find out!
Refcounted smart pointers don't work because there are numerous cycles in Value
/ Env
objects. They're feasible for Expr
objects, but those are not the main source of memory consumption and need to be retained for most of the execution of the program anyway, hence why I never bothered with proper memory management for them.
I implemented a precise collector for Nix once (https://github.com/NixOS/nix/compare/master...precise-gc). It requires all roots on the stack/heap to be marked explicitly. This would avoid the Boehm problem of keeping large graphs of values alive because of misidentified pointers. However, it's unclear how much Nix's memory problems are due to this. Probably, Nix's evaluation model in combination with Nixpkgs is just inherently wasteful (e.g. attributes like .override
make it impossible to GC the inputs to a package function), so no alternative garbage collector is going to make a substantial improvement.
How about just DO NOT use any GC
Lazy functional languages tend to produce cycles, so the "least GC" you can get, short of just giving up on ever releasing cyclic structures, is reference counting with cycle detection. Cycle detection is still a kind of GC, but a GC that gets to run a little less often.
the GC is completely non-thread-safe
Boehm GC is actually thread safe. Or do you mean that it needs to stop all threads? That is a valid strategy for GC, and I believe it is implemented / "configured" correctly in Nix.
And users of libexpr now must fork processes to concurrently eval things.
The evaluator never promised safe evaluation by multiple threads, regardless of whether GC is enabled.
Implementing an efficient runtime system for a lazy functional language is non-trivial and a mere reply is not going to do that any justice. There's vast research into this topic that you'll need to read up on before you can make a realistic suggestion about how to improve this, and even then it's going to need to be measured and not be guesswork.
This discussion got a bit off topic. None of what's discussed just now invalidates @jsoo1's proposal to get a potential incremental improvement from a different GC implementation. If we want to delve into other ideas to improve memory management, please open a new issue.
Hey whippet author here! I think whippet's target use case is pretty compatible with Nix's. And actually I hope that by replacing the GC in Guile with Whippet, Guix will get some memory improvements as well, so it's really quite aligned I think.
Limitations currently:
_Atomic
to every operand of a C11 atomic op, which I need to fix. Shouldn't be too much work here thoI am about to start getting Whippet into Guile, which will be the real proof that it's an improvement. Plan is to first convert Guile to use Whippet but with BDW implementing the Whippet API, then try to switch to the mostly-marking collector. Probably will take 6 months or so. If someone wants to have a look at it, happy to chat some time; wingo at igalia dot com.
Originally posted by @jsoo1 in https://github.com/NixOS/nix/issues/8621#issuecomment-1615952672