NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.79k stars 1.52k forks source link

Make output paths/hashes indifferent to possible CA/FOD origin of sources #9259

Open roberth opened 1 year ago

roberth commented 1 year ago

Is your feature request related to a problem? Please describe.

We can't currently swap out an inputSrcs derivation input for an inputDrvs input derivation input and get the same output path, despite no observable differences (with the possible exception of introspective features such as exportReferencesGraph, which we could force to be mutually exclusive with this improved hashing feature).

nix repl session illustrating how it currently works, and what this improvement entails Currently we have some equivalence between FOD output paths and sources: ``` nix-repl> pkgs.emptyFile.outPath "/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file" nix-repl> "${./empty-file}" "/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file" ``` However, as expected, the string contexts are different: ``` nix-repl> :p builtins.getContext pkgs.emptyFile.outPath { "/nix/store/jnghgcdkdc79p7qp4vfc27ryvnnm5igi-empty-file.drv" = { outputs = [ "out" ]; }; } nix-repl> :p builtins.getContext "${./empty-file}" { "/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file" = { path = true; }; } ``` In itself, this is not a problem. Fact is that they need to be built differently. One involves an FOD. The other was added to the store during instantiation. That being the case, we can expect dependents to have different derivation hashes, representing the different build graph. ``` nix-repl> pkgs.concatText "derived" [ pkgs.emptyFile.outPath ] «derivation /nix/store/nmzk7crah5himm4h3nbh6ry8331c99nq-derived.drv» nix-repl> pkgs.concatText "derived" [ "${./empty-file}" ] «derivation /nix/store/6bs6pkf3z4gfmjk0ssr7qfldrvyr8qfy-derived.drv» ``` So far so good, but now we observe a difference in output paths, which is not necessary. ``` nix-repl> "${pkgs.concatText "derived" [ pkgs.emptyFile.outPath ]}" "/nix/store/4q9z73ak24fl3qr4j5wwkjbw0l7ch0rh-derived" nix-repl> "${pkgs.concatText "derived" [ "${./empty-file}" ]}" "/nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived" ``` By implementing this proposal (and enabling it in Nixpkgs), both output paths would have the value `/nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived`.

Describe the solution you'd like

Instantiation works the same, except for the hashing of certain input-addressed outputs.

Specifically, when a derivation attribute flag such as __derivationAgnosticOutputHashing is enabled, inputs from FODs are ignored for the purpose of hashing; instead interpreting their output paths as inputSrcs.

Ignore __derivationAgnosticOutputHashing when the build could perceive the difference, e.g. when it has flags that would let it introspect its dependencies. I don't think it could use exportReferencesGraph on itself, so this may not even be a problem. I just haven't checked whether that was everything that needs to be controlled. TBD.

Describe alternatives you've considered

Fail instead of ignoring __derivationAgnosticOutputHashing. In practice this would mean that mkDerivation has to be clever about when to set the flag, which seems rather convoluted, pointless, and ever so slightly slower.

Additional context

Priorities

Add :+1: to issues you find important.

Ericson2314 commented 1 year ago

Yeah I've wanted this for a bit. Note that it can/should apply to all content-addressed outputs too; that is also non-fixed ones.

roberth commented 11 months ago

The logic should also apply to CA realisation (.doi) lookups, which are currently just drvHash based; not only input addressed derivation outputs seem to suffer from the discrepancy.

Ericson2314 commented 11 months ago

@roberth I've also been thinking about whether we can put the caching of this stuff in the derivation itself. Right now needing to recompute everything from the roots is annoying/slow. It would be cool if you could resume it from any derivation. (Of course, for verifying the store it would be good to still retain the ability to check it from scratch.)

Ericson2314 commented 11 months ago

https://github.com/NixOS/nix/blob/c8458bd731eb1c74159bebe459ea00165e056b65/src/libstore/daemon.cc#L583-L614 oh here is another angle I forgot.

If we make that inputSrcs vs inputDrvs never influences the output path then we get:

  1. Even more preservation of input addresses across rewrites which don't matter
  2. Sending input-addressed derivations to another machine no longer requires trust / no longer has attack vector of writing to arbitrary output paths. (However sending over input-addressed data is still risky, so this only helps with "shallow" input-addressing where the remote already has/trust any input that isn't content addressed.)
  3. Also addresses the caching issue above, because we only need to look up the output paths of our immediate deps.

The algorithm is very simple (works for the dynamic derivations definition too):

  1. Inputs are conceptually Set DerivingPath
  2. Whenever we have a DerivingPath::Output { drvPath, outputName } we can statically/trustlessly evaluate to a DerivingPath::Constant, do so. (Anywhere nested, for dynamic derivations case)
  3. Once no more such static/trustless rewrites are possible, calculate the output path on the resulting derivation.
roberth commented 11 months ago

Probably best to start with

That will give us the language to describe the new hashing scheme more effectively.

Ericson2314 commented 11 months ago

6877 wanted to get to that :)