Persistent evaluation cache primop

roberth commented 2 years ago

Is your feature request related to a problem? Please describe.

The flake-based evaluation cache has too coarse granularity to be useful for caching during development, updates, etc.

Having a more powerful solution will also benefit flakes, as it becomes feasible to use between flakes, not just at the cli boundary. A language-integration cache may also allow easier experimentation than a cache that is tied to cli concepts.

Describe the solution you'd like

A new primop that performs persistent caching of any Nix function. Fully general or transparent caching (memoization) of functions is too hard. We need to strike a balance by restricting the cache so that we avoid the hard problems. My suggestion is to memoize roughly* this function:

f: j: import f j

ie import from store path, then apply. The type is something like

path -> json -> cached-eval-thunk

By using a primop, we avoid the hard problem of having to determine what to cache. By restricting the arguments to serializable data, such as a store path and json-serializable data, we avoid the serialization of functions. By restricting cached-eval-thunk to serializable data, we can again avoid serialization of functions.

A cached-eval-thunk references the f and j arguments, and contains a path. The path is a list of attribute names and/or list indexes. It also contains a regular thunk representing the real thunk import f j. When the thunk is forced, it looks up the value in the cache db. If no entry is available, it performs looks up the result in the real thunk and stores the result in the cache db. If a result exists, it the thunk turns into a primitive or into an attrset or list of cached-eval-thunks.

*: for all of this to work, the expression in f must not be allowed to perform any action that wouldn't be referentially transparent. This can be done by performing the evaluation of the real thunk in an extra pure mode that forbids readFile, addToStore (except on paths already in the store), etc.

Describe alternatives you've considered

import f j seems powerful enough, but maybe another function is more efficient?

We'll generate a store path for each cached function, but don't think that's a problem, because the primop should be used with care anyway. Db lookups aren't free. A non-storepath representation of functions without free variables may also be suitable, but more work.

Additional context

I swear that I wrote about this before, but I couldn't find it in the issue tracker :confused:

kamadorueda commented 2 years ago

It's only safe to persistently memoize functions whose closure does not have free variables, so I believe this would only be possible in a Flakes (if hermetic and without --impure) world

import f j

Here is a counter example:

nix-repl> json = { }

nix-repl> storePath = builtins.toFile "file" "args: \"\${/data/alejandra/default.nix}\""

nix-repl> builtins.readFile storePath
"args: \"${/data/alejandra/default.nix}\""

nix-repl> import storePath json
"/nix/store/qy5kmb8xf7brinv2sznik7ng3dr8nk4q-default.nix"

if I changed /data/alejandra/default.nix after caching the result, the cached result would become invalid, so we would not be able to cache this persistently, only during this evaluation

roberth commented 2 years ago

@kamadorueda That would indeed have been a loophole. Thanks for pointing that out. I still think it's feasible to counteract this by evaluating the cached function in an even stricter pure mode; perhaps "referentially transparent mode". This mode only needs to be active while computing cached values. When the cached-eval-thunk's forceValue is done, the EvalState could switch back to its normal state (pure/restricted/impure). I've updated the text a little.

thufschmitt commented 2 years ago

The flake-based evaluation cache has too coarse granularity to be useful for caching during development, updates, etc.

Yes and no. The bottleneck isn’t the use of flakes, but the fact that the cache is tied to the CLI. If the flake was really per-flake, then that would mean that each call to builtins.geFlake (or each flake you depend on) would be cached independently, which would solve 90% of the problem. To take the most common use-case, I’m fairly confident that the vast majority of the eval time for a standard flake is spent evaluating the packages imported from the nixpkgs flake. So if this had its own cache independent from the toplevel flake, then re-evaluating the flake would be nearly free.

Now this isn’t trivial because it means hooking the cache system into the evaluator (and do so without having a too negative impact on uncached evaluations). I’ve given it a try in https://github.com/NixOS/nix/pull/4511, but had to give up because the implementation was unbearably slow.

roberth commented 2 years ago

@thufschmitt Did you consider adding a new type of thunk instead of modifying regular attr access? That way the performance overhead only applies to cacheable values and not to intermediate values or the values produced by functions.

unbearably slow.

Are you referring to the ~10% cold cache overhead? Doesn't seem so bad and I think it could be improved with the thunk approach. I would expect the noCache case to reach 0% as the thunk based approach doesn't seem to affect existing Value subclasses. Optimistically, this could halve the cold cache overhead, assuming old cold cache = old no cache + new cold cache.

roberth commented 2 years ago

The bottleneck isn’t the use of flakes,

For flakes it may not be, but restricting the cache to flake boundaries is unnecessary and makes the caching of other functions harder, such as custom nixpkgs invocations or improvements to the structure of flakes, especially the introduction of a function that takes system as in input, which we ought to do for niche architecture support.

thufschmitt commented 2 years ago

@thufschmitt Did you consider adding a new type of thunk instead of modifying regular attr access? That way the performance overhead only applies to cacheable values and not to intermediate values or the values produced by functions.

I did, yes. I don’t remember precisely why I didn’t go that way eventually, but I’m fairly confident that it wasn’t making things any faster (The extra word in Attrs is totally acceptable given that an Attrs will generally be big-ish, and the cost of a no-op lookup is also very small compared to any operation you’ll want to perform on the attr). Tbh I think that the slowdown wasn’t really due to anything fundamental, it’s more that I had to refactor things a bit to make the implementation decent, and that this refactoring made things slower.

For flakes it may not be, but restricting the cache to flake boundaries is unnecessary and makes the caching of other functions harder, such as custom nixpkgs invocations or improvements to the structure of flakes, especially the introduction of a function that takes system as in input, which we ought to do for niche architecture support.

Well, maybe, but I’ve yet to see a semantics that would allow for a guaranteed-correct caching and be significantly more flexible than using flakes as the boundaries

roberth commented 2 years ago

it’s more that I had to refactor things a bit to make the implementation decent, and that this refactoring made things slower.

A new type of thunk only adds a branch to forceValue and a few switch cases. Seems more benign to me.

a semantics that would allow for a guaranteed-correct caching

I think I'm pretty close, but I can't make a confident claim without more research. To summarize: switch temporarily to a mode that enforces referential transparency, throwing an exception on addToStore(mutablePath), etc. The exception then triggers regular evaluation.

and be significantly more flexible than using flakes as the boundaries

It must be similar in terms of what is allowed inside the cached function without resorting to normal evaluation. It can't really work without referential transparency.

Users may have their own use cases for caching, perhaps nixos configurations in a multi-node deployment, and flakes can use this for niche architecture support as mentioned before. It wouldn't be nice to have to add custom caching code for that. A more general caching feature supports these things naturally.

I'll stop responding for a while because of priorities; sorry.

If anything, the thunk approach seems worth a shot even with flake boundary caching.

tomberek commented 2 years ago

random thoughts:

This is already pretty close via?

The key proposal here is to make the capability flake-oblivious. If you squint a bit the "inputs" in a flake is the j parameter and "outputs" is f. Many of the restrictions and considerations are the same.

If the flake [sic? cache] was really per-flake, then that would mean that each call to builtins.geFlake (or each flake you depend on) would be cached independently, which would solve 90% of the problem.

It's not the getFlake call, but the evaluation of its attributes that need to be cached. Does this imply something like?

builtins.getInstallable("path:./.",["legacyPackages" system "hello"])

that isn't too far off from the proposed

builtins.cache ./some-dir ./trivial-inputs
# or
bulitins.cachedEvalAttr ./some-func.nix ./trivial-inputs attrPath

where it is specifically the attrPath traversal that is cached, in the same way the current eval caching works: lazy determination of what attrs are available and their values. where ./some-func.nix and ./some-inputs can imply the cache key.

I think import f j is a bit too powerful because f can keep changing and the surrounding context can also access arbitrary parts of the structure. Instead I'm thinking about something like: lib.getAttrByPath (import f j) attrPath where specifically the getAttrByPath recursion specifies what is cached.

roberth commented 2 years ago

This is already pretty close via? [...]

This code seems to be about in-memory caching of ast and root Value. In-memory caching aka memoization is a much easier problem to solve and iirc a solution is already sitting on a branch somewhere. I'd like to have that, because it could replace trie-based solutions I've been using, but that's not what I want to discuss here, as it doesn't help for the caching of dependency expressions (inputs etc).

The key proposal here is to make the capability flake-oblivious. If you squint a bit the "inputs" in a flake is the j parameter and "outputs" is f.

:100:

Many of the restrictions and considerations are the same.

That is correct. Specifically, we want most referentially transparent code to be cached and non-referentially transparent code not to be cached.

If the flake [sic? cache] was really per-flake, then that would mean that each call to builtins.geFlake (or each flake you depend on) would be cached independently, which would solve 90% of the problem.

It's not the getFlake call, but the evaluation of its attributes that need to be cached. Does this imply something like?

This is in the context of @thufschmitt's PR where he implemented a cache per getFlake call that returned special attrsets that query the cache transparently. So your example

builtins.getInstallable("path:./.",["legacyPackages" system "hello"])

... is actually called transparently on attribute access.

that isn't too far off from the proposed
builtins.cache ./some-dir ./trivial-inputs
# or
bulitins.cachedEvalAttr ./some-func.nix ./trivial-inputs attrPath
where it is specifically the attrPath traversal that is cached, in the same way the current eval caching works: lazy determination of what attrs are available and their values. where ./some-func.nix and ./some-inputs can imply the cache key.

I think import f j is a bit too powerful because f can keep changing

This is a good thing. It is more powerful than what the current flake format needs, but the current flake format isn't good, as it creates problems for niche systems (see https://github.com/NixOS/nix/issues/3843#issuecomment-942594912).

If this caching primop is implemented, changing the flake format amounts to changing the call-flake.nix expression rather than overhauling a flake-coupled cache in C++.

and the surrounding context can also access arbitrary parts of the structure. Instead I'm thinking about something like: lib.getAttrByPath (import f j) attrPath where specifically the getAttrByPath recursion specifies what is cached.

I believe this concern is addressed by the special attrsets. Those make the getAttrByPath part transparent.

tomberek commented 2 years ago

I believe this concern is addressed by the special attrsets. Those make the getAttrByPath part transparent.

I was not aware of the special attrsets. Is caching then limited to only that behavior? My intent was to limit caching only to attribute path walking. This is still more powerful than flakes due to decoupling the f and j.

roberth commented 2 years ago

Is caching then limited to only that behavior?

In his implementation, yes. I think we'll want to cache lists too, for drv.outputs, but those can follow a similar pattern, although I don't know if their representation permits his approach as easily. A thunk-based implementation would support both data structures in the same way without having to change the actual list or attrset value* representations.

(*) value as in actual value, not thunk. Value should have been called Object, because a thunk is arguably not a value, but still a heap object.

tomberek commented 2 years ago

Somewhat related: https://github.com/DavHau/nix-eval-cache

nixos-discourse commented 2 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/get-flake-input-store-path/20202/10

blaggacao commented 2 years ago

Data Point:

In a scenario where a CLI interacts with a dirty flake for fetching metadata out of the repo, flake-level caching is not enough * and a caching primop seems to make the experience bearable.

* iirc, UX research gives us 200ms max.

Profpatsch commented 2 years ago

cc @Gabriella439 maybe she has some good input on memoization in a lazy config language.

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/what-would-you-like-to-see-improved-in-nix-cli-experience/24012/9

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/can-i-use-flakes-within-a-git-repo-without-committing-flake-nix/18196/33

infinisil commented 11 months ago

Just noticed that I haven't linked this before, but I did a brainstorming about this here: https://github.com/tweag/epcb

It's very drafty right now, but I'd like to work on this more, maybe with a PoC

NixOS / nix

Persistent evaluation cache primop #6228