NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.92k stars 1.53k forks source link

Add memoise primop #8025

Open Artturin opened 1 year ago

Artturin commented 1 year ago

just a issue to not have the memoise primop hidden and forgotten in a commit on a non merged branch https://github.com/NixOS/nix/commit/0395b9b94af56bb814810a32d680c606614b29e0

'builtins.memoise f x' is equal to 'f x', but uses a cache to speed up repeated invocations of 'f' with the same 'x'. A typical application is memoising evaluations of Nixpkgs in NixOps network specifications. For example, with the patch below, the time to evaluate a 10-machine NixOps network is reduced from 17.1s to 9.6s, while memory consumption goes from 4486 MiB to 3089 MiB. (This is with GC_INITIAL_HEAP_SIZE=16G.)

Priorities

Add :+1: to issues you find important.

roberth commented 1 year ago

The function returned by memoise as implemented in 0395b9b is strict in its argument. I suppose this may be expected. However...

Nondeterministic strictness

Potentially worse is that it does not consistently perform a deep traversal, but rather evaluates attributes (etc) based on which other arguments the function has seen in the past. This breaks strong referential transparency. The strictness behavior of a function is observable, so it must not depend on unrelated past arguments. This may be resolved by doing a proper deepSeq on the argument. This avoids the head scratching but does make memoise less useful and probably a little slower. An alternative is to make the evaluation of the argument optional. If the comparison and lookup encounter an exception, we may catch the exception and behave accordingly. It's always possible to just assume that the argument is distinct from anything else, and therefore apply no memoization, but perhaps we could continue the comparison, maybe compare exceptions. If both the current argument and a cached argument behave the same, we may consider them equivalent. We might consider exceptions to be values throwing an exception to be smaller than everything for the purpose of MemoArgComparator. However, this assumes that the value produces an exception consistently and cheaply. This may not be the case! Some exceptions may be retried after tryEval.

Lazy memoise?

Intruiging idea, but too complex and suffers from similar issue with failing for the wrong reason because of memoization cache state (I may be wrong, but still quite confident that we don't have to try this) Apologies for the not so polished braindump, but I have to share *something*, so that others can try to check my reasoning or just avoid this path altogether. Wild idea; probably not useful but, but it feels close, so I'm sharing it anyway. This one is probably too tricky. Return a proxy thunk representing the return value. The concept of a proxy value may also be useful for `getFlake` caching, or caching flake attributes between flakes. The proxy thunk remembers which thunk needs to be passed. When forced, it computes the whnf of the returned value, but unless it's a primitive value, it returns a copy where all the items are again proxy thunks. Whenever a thunk in the original argument is evaluated, a comparison can be made to other entries of the cache. Disadvantage: memoisation may not be effective at all in some cases, or until most of the computation is already done anyway. The proxy values create garbage that won't be deduplicated. Advantage: full lazy, it seems. Works for arguments that are identical by memory address. Strict memoisation can be recovered by passing a function that does a deepSeq first. Seems like an elegant property, but it is less efficient. Perhaps if the proxy thunks are clever enough, they can evaluate precisely what's necessary to make sure that whatever computation they're doing is equivalent, _by tracking that the evaluated parts of the input are equal_. It seems that this should work, and that it supports deduplicating computations where the input is _different_ but only in an irrelevant part of the input (which is not actually evaluated). Intriguing, but speculatively evaluating too much of the wrong thing may again mean that we crash for a completely arbitrary reason.
roberth commented 8 months ago

I've opened a draft PR for it, #10280.