Open roberth opened 2 years ago
It's only safe to persistently memoize functions whose closure does not have free variables, so I believe this would only be possible in a Flakes (if hermetic and without --impure) world
import f j
Here is a counter example:
nix-repl> json = { }
nix-repl> storePath = builtins.toFile "file" "args: \"\${/data/alejandra/default.nix}\""
nix-repl> builtins.readFile storePath
"args: \"${/data/alejandra/default.nix}\""
nix-repl> import storePath json
"/nix/store/qy5kmb8xf7brinv2sznik7ng3dr8nk4q-default.nix"
if I changed /data/alejandra/default.nix after caching the result, the cached result would become invalid, so we would not be able to cache this persistently, only during this evaluation
@kamadorueda That would indeed have been a loophole. Thanks for pointing that out.
I still think it's feasible to counteract this by evaluating the cached function in an even stricter pure mode; perhaps "referentially transparent mode". This mode only needs to be active while computing cached values. When the cached-eval-thunk's forceValue
is done, the EvalState
could switch back to its normal state (pure/restricted/impure). I've updated the text a little.
The flake-based evaluation cache has too coarse granularity to be useful for caching during development, updates, etc.
Yes and no. The bottleneck isn’t the use of flakes, but the fact that the cache is tied to the CLI. If the flake was really per-flake, then that would mean that each call to builtins.geFlake
(or each flake you depend on) would be cached independently, which would solve 90% of the problem.
To take the most common use-case, I’m fairly confident that the vast majority of the eval time for a standard flake is spent evaluating the packages imported from the nixpkgs flake. So if this had its own cache independent from the toplevel flake, then re-evaluating the flake would be nearly free.
Now this isn’t trivial because it means hooking the cache system into the evaluator (and do so without having a too negative impact on uncached evaluations). I’ve given it a try in https://github.com/NixOS/nix/pull/4511, but had to give up because the implementation was unbearably slow.
@thufschmitt Did you consider adding a new type of thunk instead of modifying regular attr access? That way the performance overhead only applies to cacheable values and not to intermediate values or the values produced by functions.
unbearably slow.
Are you referring to the ~10% cold cache overhead? Doesn't seem so bad and I think it could be improved with the thunk approach. I would expect the noCache
case to reach 0% as the thunk based approach doesn't seem to affect existing Value subclasses. Optimistically, this could halve the cold cache overhead, assuming old cold cache = old no cache + new cold cache
.
The bottleneck isn’t the use of flakes,
For flakes it may not be, but restricting the cache to flake boundaries is unnecessary and makes the caching of other functions harder, such as custom nixpkgs invocations or improvements to the structure of flakes, especially the introduction of a function that takes system
as in input, which we ought to do for niche architecture support.
@thufschmitt Did you consider adding a new type of thunk instead of modifying regular attr access? That way the performance overhead only applies to cacheable values and not to intermediate values or the values produced by functions.
I did, yes. I don’t remember precisely why I didn’t go that way eventually, but I’m fairly confident that it wasn’t making things any faster (The extra word in Attrs
is totally acceptable given that an Attrs
will generally be big-ish, and the cost of a no-op lookup is also very small compared to any operation you’ll want to perform on the attr). Tbh I think that the slowdown wasn’t really due to anything fundamental, it’s more that I had to refactor things a bit to make the implementation decent, and that this refactoring made things slower.
For flakes it may not be, but restricting the cache to flake boundaries is unnecessary and makes the caching of other functions harder, such as custom nixpkgs invocations or improvements to the structure of flakes, especially the introduction of a function that takes system as in input, which we ought to do for niche architecture support.
Well, maybe, but I’ve yet to see a semantics that would allow for a guaranteed-correct caching and be significantly more flexible than using flakes as the boundaries
it’s more that I had to refactor things a bit to make the implementation decent, and that this refactoring made things slower.
A new type of thunk only adds a branch to forceValue
and a few switch
case
s. Seems more benign to me.
a semantics that would allow for a guaranteed-correct caching
I think I'm pretty close, but I can't make a confident claim without more research. To summarize: switch temporarily to a mode that enforces referential transparency, throwing an exception on addToStore(mutablePath), etc. The exception then triggers regular evaluation.
and be significantly more flexible than using flakes as the boundaries
It must be similar in terms of what is allowed inside the cached function without resorting to normal evaluation. It can't really work without referential transparency.
Users may have their own use cases for caching, perhaps nixos configurations in a multi-node deployment, and flakes can use this for niche architecture support as mentioned before. It wouldn't be nice to have to add custom caching code for that. A more general caching feature supports these things naturally.
I'll stop responding for a while because of priorities; sorry.
If anything, the thunk approach seems worth a shot even with flake boundary caching.
random thoughts:
This is already pretty close via?
The key proposal here is to make the capability flake-oblivious. If you squint a bit the "inputs" in a flake is the j
parameter and "outputs" is f
. Many of the restrictions and considerations are the same.
If the flake [sic? cache] was really per-flake, then that would mean that each call to builtins.geFlake (or each flake you depend on) would be cached independently, which would solve 90% of the problem.
It's not the getFlake call, but the evaluation of its attributes that need to be cached. Does this imply something like?
builtins.getInstallable("path:./.",["legacyPackages" system "hello"])
that isn't too far off from the proposed
builtins.cache ./some-dir ./trivial-inputs
# or
bulitins.cachedEvalAttr ./some-func.nix ./trivial-inputs attrPath
where it is specifically the attrPath traversal that is cached, in the same way the current eval caching works: lazy determination of what attrs are available and their values. where ./some-func.nix and ./some-inputs can imply the cache key.
I think import f j
is a bit too powerful because f can keep changing and the surrounding context can also access arbitrary parts of the structure. Instead I'm thinking about something like: lib.getAttrByPath (import f j) attrPath
where specifically the getAttrByPath recursion specifies what is cached.
This is already pretty close via? [...]
This code seems to be about in-memory caching of ast and root Value
. In-memory caching aka memoization is a much easier problem to solve and iirc a solution is already sitting on a branch somewhere. I'd like to have that, because it could replace trie-based solutions I've been using, but that's not what I want to discuss here, as it doesn't help for the caching of dependency expressions (inputs etc).
The key proposal here is to make the capability flake-oblivious. If you squint a bit the "inputs" in a flake is the
j
parameter and "outputs" isf
.
:100:
Many of the restrictions and considerations are the same.
That is correct. Specifically, we want most referentially transparent code to be cached and non-referentially transparent code not to be cached.
If the flake [sic? cache] was really per-flake, then that would mean that each call to builtins.geFlake (or each flake you depend on) would be cached independently, which would solve 90% of the problem.
It's not the getFlake call, but the evaluation of its attributes that need to be cached. Does this imply something like?
This is in the context of @thufschmitt's PR where he implemented a cache per getFlake call that returned special attrsets that query the cache transparently. So your example
builtins.getInstallable("path:./.",["legacyPackages" system "hello"])
... is actually called transparently on attribute access.
that isn't too far off from the proposed
builtins.cache ./some-dir ./trivial-inputs # or bulitins.cachedEvalAttr ./some-func.nix ./trivial-inputs attrPath
where it is specifically the attrPath traversal that is cached, in the same way the current eval caching works: lazy determination of what attrs are available and their values. where ./some-func.nix and ./some-inputs can imply the cache key.
I think
import f j
is a bit too powerful because f can keep changing
This is a good thing. It is more powerful than what the current flake format needs, but the current flake format isn't good, as it creates problems for niche system
s (see https://github.com/NixOS/nix/issues/3843#issuecomment-942594912).
If this caching primop is implemented, changing the flake format amounts to changing the call-flake.nix
expression rather than overhauling a flake-coupled cache in C++.
and the surrounding context can also access arbitrary parts of the structure. Instead I'm thinking about something like:
lib.getAttrByPath (import f j) attrPath
where specifically the getAttrByPath recursion specifies what is cached.
I believe this concern is addressed by the special attrsets. Those make the getAttrByPath
part transparent.
I believe this concern is addressed by the special attrsets. Those make the
getAttrByPath
part transparent.
I was not aware of the special attrsets. Is caching then limited to only that behavior? My intent was to limit caching only to attribute path walking. This is still more powerful than flakes due to decoupling the f
and j
.
Is caching then limited to only that behavior?
In his implementation, yes. I think we'll want to cache lists too, for drv.outputs
, but those can follow a similar pattern, although I don't know if their representation permits his approach as easily. A thunk-based implementation would support both data structures in the same way without having to change the actual list or attrset value* representations.
(*) value as in actual value, not thunk. Value
should have been called Object
, because a thunk is arguably not a value, but still a heap object.
Somewhat related: https://github.com/DavHau/nix-eval-cache
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/get-flake-input-store-path/20202/10
Data Point:
In a scenario where a CLI interacts with a dirty flake for fetching metadata out of the repo, flake-level caching is not enough * and a caching primop seems to make the experience bearable.
* iirc, UX research gives us 200ms max.
cc @Gabriella439 maybe she has some good input on memoization in a lazy config language.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/what-would-you-like-to-see-improved-in-nix-cli-experience/24012/9
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
Just noticed that I haven't linked this before, but I did a brainstorming about this here: https://github.com/tweag/epcb
It's very drafty right now, but I'd like to work on this more, maybe with a PoC
Is your feature request related to a problem? Please describe.
The flake-based evaluation cache has too coarse granularity to be useful for caching during development, updates, etc.
Having a more powerful solution will also benefit flakes, as it becomes feasible to use between flakes, not just at the cli boundary. A language-integration cache may also allow easier experimentation than a cache that is tied to cli concepts.
Describe the solution you'd like
A new primop that performs persistent caching of any Nix function. Fully general or transparent caching (memoization) of functions is too hard. We need to strike a balance by restricting the cache so that we avoid the hard problems. My suggestion is to memoize roughly* this function:
ie import from store path, then apply. The type is something like
By using a primop, we avoid the hard problem of having to determine what to cache. By restricting the arguments to serializable data, such as a store path and json-serializable data, we avoid the serialization of functions. By restricting cached-eval-thunk to serializable data, we can again avoid serialization of functions.
A cached-eval-thunk references the
f
andj
arguments, and contains a path. The path is a list of attribute names and/or list indexes. It also contains a regular thunk representing the real thunkimport f j
. When the thunk is forced, it looks up the value in the cache db. If no entry is available, it performs looks up the result in the real thunk and stores the result in the cache db. If a result exists, it the thunk turns into a primitive or into an attrset or list of cached-eval-thunks.*: for all of this to work, the expression in
f
must not be allowed to perform any action that wouldn't be referentially transparent. This can be done by performing the evaluation of the real thunk in an extra pure mode that forbidsreadFile
,addToStore
(except on paths already in the store), etc.Describe alternatives you've considered
import f j
seems powerful enough, but maybe another function is more efficient?We'll generate a store path for each cached function, but don't think that's a problem, because the primop should be used with care anyway. Db lookups aren't free. A non-storepath representation of functions without free variables may also be suitable, but more work.
Additional context
I swear that I wrote about this before, but I couldn't find it in the issue tracker :confused: