NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.73k stars 1.52k forks source link

Virtual path semantics that are deterministic, lazy and suitable for performing deduplication of values by a path value key. #10689

Open roberth opened 5 months ago

roberth commented 5 months ago

Is your feature request related to a problem? Please describe.

This issue discusses the solution space for a crucial step to make NixOS evaluation not copy nixpkgs to the store.

6530 solves the problem of lazy fetching while using the module system, by introducing another problem, which is new observable nondeterminism in the form of virtual path string placeholders (/nix/store/virtual0000<n>).

Lazy paths (https://github.com/NixOS/nix/pull/10252) also add laziness, but the module system's use of paths causes it to have to copy nonetheless.

Describe the solution you'd like

Some possible solutions

This is an unsolved problem. Lazy trees will leak non-deterministic virtual path names in this case, which in my view is the primary reason why we can't just adopt that solution. Possible solutions I'm aware of

Additional context

Discussed in meeting before, links tbd.

Also came up in

Priorities

Add :+1: to issues you find important.

edolstra commented 5 months ago

The non-determinism issue can be solved in the common case by deriving the virtual path from the input fingerprint. This wouldn't work for impure inputs, but I suppose if you have impure inputs, it's to be expected that you might end up with non-determinism in your evaluation.

It's also not obvious how to come up with a fingerprint for accessors like those returned by builtins.filterPath (since we don't have a way to fingerprint the filter function).

Another solution might be to restrict what you can do with a virtual path substring, e.g. make it illegal to call builtins.substring to get individual characters of the virtual path hash.

roberth commented 5 months ago

For stability, it'd be best to expose the fingerprints as little as possible, e.g. only their ordering but not their contents, but even exposing the ordering is a liability that we don't currently have. Even if we can call a git rev deterministic, that doesn't mean that the user experience will be deterministic (e.g.: why was the list in the config reordered after this commit which only touches the readme...).

This wouldn't work for impure inputs

We might want to consider locking them (in memory) anyway, if we don't already do that. If performance concerns prevent us from doing that, we could use an imperfect fingerprint.

builtins.filterPath (since we don't have a way to fingerprint the filter function).

We could evaluate the set of all included paths and then fingerprint the set and the underlying source. It's eager at the language level, but does not require any copying. When the underlying source is itself a filtering source, we should merge the functions (inductively) instead of naively fingerprinting the underlying source to recover && short-circuit behavior, but at that point I think it should work well.

Another solution might be to restrict what you can do with a virtual path substring, e.g. make it illegal to call builtins.substring to get individual characters of the virtual path hash.

Or fall back to copying. This could be done by adding string-specific thunk constructors (e.g. tStringConcat in https://discourse.nixos.org/t/2024-04-29-nix-team-meeting-minutes-142/45020).

nixos-discourse commented 3 months ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-08-07-nix-team-meeting-minutes-167/50287/1