Open SaltyKitkat opened 1 year ago
We are aware that Nix evaluation tends to consume significant amounts of memory. Causes and potential causes I'm aware of
I want to add the boehm garbage collector is a conservative collector that does not allow heap compaction.
I was hoping to spark some interest in assessing the mark-region algorithm as a possible new garbage collection algorithm for nix because it allows for heap compaction. There are some existing implementations in rust (immix) and c (whippet). In particular the whippet implementation seems relevant to nix because it has zero dependencies and has boehm-compatible api.
@jsoo1 Interesting! Would you be interested in giving whippet a try? I've added notes about gc.
Would you be interested in giving whippet a try? I've added notes about gc.
@roberth sweet! Yes I would be interested! I was planning on setting aside some time for it if there seemed to be interest from the team.
Let's move the discussion of replacing the GC over to https://github.com/NixOS/nix/issues/8626
Thanks for the summary!
Since there's already memory leaks, I'm wondering if the gc is working as expected and maybe just improve the gc makes no sence if the most memory usage is by the leaked memory.
I don't expect the GC itself to be broken, and I don't expect many leaks from it being conservative either. It manages to collect an amount about equal to the final heap size in a typical evaluation by ofborg (ie half of allocations are collected). It is hard to know how much it should be able to collect though. So that makes your question a good one, which could perhaps be answered with a combination of profiling and debugging, although we might need custom tooling to really start relating expressions to the heap and gc.
I ran into this while upgrading from NixOS 23.05 to 23.11 on my cloud VM with 2G of RAM. nix-build itself took 1G of that, and also there were some server services running, taking up about 500M, leaving only 500M for the actual derivation builds. Naturally it OOM'd kind of a lot.
I worked around that by taking the derivation file paths from the these NNN derivations will be built:
output, pasting that into a file and running xargs -n1 nix-build < derivations.txt
. Not sure if the -n1
also helped, but it feels like some gains could be had here by separating the two phases. I will happily be corrected if I'm working off incorrect assumptions, but it appears to me that the memory usage of nix-build is all related to Nix expressions, which at this point in the build process are entirely unneeded, since all the required information exists in the .drv
files. Maybe the Nix expression evaluation could happen in a separate process that then terminates before nix-build moves on to building the derivations, or the Nix expressions could be allocated in an arena that is freed all at once after evaluation is done, or something like that?
That would not solve the original problem, and looking into a different GC still sounds valuable, but it might make the problem less acute for a portion of affected users.
Regarding freeing the expressions, a starting point would be https://github.com/NixOS/nix/pull/5747#issuecomment-1615939700, but also making sure to destruct EvalState
and the expression cache.
If you have really small machines to deploy to, you might want to use nixos-rebuild --target-host
. That will neither build nor evaluate on the target machine.
nixos-rebuild --target-host
is a good hint and I will take that under consideration. But for what it's worth, that does not solve OOM during auto-upgrades as triggered by system.autoUpgrade.enable = true;
as far as I can see.
CC @astro FYI
While learning nix and nix flakes, this command freezed my dear and at that point mostly idle 16GB laptop, eating >10GB:
nix flake show microvm
shortened output:
github:astro/microvm.nix/7bd9255e535c8cbada7f574ddd3bcf3bfa5e1eae
├───apps
│ ├───aarch64-linux
│ │ ├───graphics: app
│ │ ├───qemu-vnc: app
│ │ ├───vm: app
│ │ └───waypipe-client: app
│ └───x86_64-linux
│ ├───graphics: app
│ ├───qemu-vnc: app
│ ├───vm: app
│ └───waypipe-client: app
├───defaultTemplate: template: Flake with MicroVMs
├───hydraJobs
│ ├───aarch64-linux
│ │ ├───cloud-hypervisor-overlay-shutdown-command: derivation 'microvm-test-shutdown-command'
[...SNIP...]
│ │ └───vm-stratovirt-iperf: derivation 'vm-stratovirt-iperf'
error: interrupted by the user
nix flake show microvm 58,38s user 4,46s system 92% cpu 1:07,85 total
The output is actually from a run after I found https://github.com/rfjakob/earlyoom - You might want to recommend this nice tool somewhere!
Please don't get this issue site tracked by me. I just thought it might be interesting to mention earlyoom in this issue and have an example on how to reliably eat a lot of memory.
https://github.com/NixOS/rfcs/pull/163 may reduce memory use for NixOS, by virtue of not having to load service modules that aren't used.
It's one solution among potentially others, such as #9650 for cases like show microvm
.
Any memory usage improvements are very welcome. My CI runner with 16 GB RAM now also occasionally triggers the OOM killer when evaluating my NixOS configurations.
I seem to be be encountering this too. A nix flake show
in the microvm
repo consumed a whopping ~24G of RAM.
I am encountering this as well, Nixpkgs-review when evalling sometimes fills up my whole RAM (16Gigs), before the usage is like 5Gigs, smh.
Just eval my nixos profile takes about 1G ram. It's kind of too much for me. And when running something like
nixpkgs-review
, nix will just take more and more and more ram.Is this by design?
Or is there any way I can reduce the memory usage?
nix-env
run bynixpkgs-review