Closed danbst closed 6 years ago
The more and more stuff is encoded in Nix language, the more .nix files resemble database, and Nix is DBMS.
https://repology.org/ says we have 40K packages in total, let's say with all the hidden sets its 100K derivation, that's not a lot.
- to perform attribute lookup in those large files whole file has to be parsed (this is insanely slow on slow drives)
Last time I checked parsing one big file took less time than a bunch of small files on a spinning rust disk.
- to perform attribute lookup whole attribute set should be kept in memory
So, we have like 90M of input data in total (du -sh pkgs
), we load it into memory, and it blows up to 8G just from being transformed into attrsets? I find this to be unrealistic.
I think that @edolstra is exactly right in https://github.com/NixOS/nixpkgs/issues/38635#issuecomment-379678500
Unfortunately Nix is not a simple package manager: it implements a lazy purely functional language, the evaluation of which uses a lot of memory. It has some design issues that make it hard to reduce memory usage (in particular the use of a monolithic
pkgs
set combined with the.override
feature makes garbage collection largely ineffective).
But I think the main problem is overrideDerivation
, not the monolithic pkgs
, though. After loading a bit of data into memory we evaluate a bunch of modifications to it, but never GC intermediates because of the .override
thing that keeps literally all of them.
If I were to really care about memory consumption in nix I would start by implementing something like nix-instantiate --dump-states
which would dump whole successive interpreter states. Looking at those would be interesting, I'm sure there's a lot of surprises there.
Also. I sometimes wonder why can't we just distribute pre-evaluated .drv files with the channels. Surely, most users don't actually need to evaluate anything if they don't override anything. I evaluate all .drv files on a single machine with a lot of RAM and then distribute them (and their realizations, where appropriate), surely this can be done for in channels too. With a partial evaluator things could be simpler, though.
I did a test - removed all unused lines from top-leve/all-packages.nix
, that were not used to evaluate my system (that was tough!). The resulting file had 3500 lines VS 22100 original.
This had an insignificant impact on performance, ~1%. Amount of memory used was reduced, but also not significantly. so @oxij at all are right here I think. Closing.
overlays, which produce slightly modified copies of the enourmous attrset
For this usecase it would be better to implement something like linkedhashmap in Nix. On other hand, only keysets are stored in memory. How large is pkgs
with not evaluated thunks?
The more and more stuff is encoded in Nix language, the more .nix files resemble database, and Nix is DBMS.
Main (current) problems this creates:
What should be done instead, is borrow techniques from DBMS - shared buffers and indexing.
Indexing
The parsing and lazy-loading of attributes are separate problems, but can be tackled with same technique - indexing.
The
./hackage-packages-index.nix
file looks like this:So, this is loose implementation of B-Tree, which however doesn't require to be MUTABLE. All those large .nix files should be autogenerated, so all the indexes can be recreated in one go.
The
__index__XXX
attribute explains Nix evaluator to treat result as attribute set and merge it (virtually) into current one. If it is not an attrset, an error should be generated. This is done ONLY when attribute lookupXXX....
requested. It then tries to lookup requested attribute in fetched attrset.The new builtin
importByOffset
is same as import, except it seeks from beginning of file and parses only required part of file. It reuses context from caller file. It should be possible to import code from any part of target file, even if it isn't Nix.Pros:
Extremely lazy loading
In my previous PR (https://github.com/NixOS/nix/issues/1539) I suggested to allow override
(.)
operator with some custom function. Probably indexing technique can benefit here too. For example, custom function may fetch underlying large package database (or concrete subset of it) from internet, verifying hashes of downloaded content with ones in Nixpkgs.Or fallback to generation of required expression (given there is a function that can generate it).