NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.2k stars 1.47k forks source link

Skip evaluation of unrelated attrs if FOD already exists #8407

Open nagy opened 1 year ago

nagy commented 1 year ago

Is your feature request related to a problem? Please describe. This is not a problem, it is potentially an optimization opportunity. When using fixed-output-derivations (FOD) other attributes than the outputHash* attrs still get evaluated, if the resulting store path already exists. Please assume the following default.nix file:

with import <nixpkgs> { };
runCommandLocal "test.txt" {
  outputHashMode = "recursive";
  outputHashAlgo = "sha256";
  outputHash = "sha256-d6xi4mKdjkX2JFicDIv5niSzpyI0m/Hnm8GGAIU04kY=";
  TESTATTR = builtins.trace "This gets evaluated" nixosTests.firefox;
} ">$out"

Lets build it:

$ time { nix-build && nix-store --query --deriver ./result ; }
trace: This gets evaluated
/nix/store/1a7i77041rz7y0jwd1bnwhh21q93xgmz-test.txt
/nix/store/cy29r8wn0928z64wifwhmmxd6h8slzkg-test.txt.drv

real    0m10.167s
user    0m7.269s
sys 0m1.373s

To my understanding, the hash of the store path is calculated from the output and the name attribute. Under this assumption, the resulting store path hash can be calculated quickly and checked for its existence before attempting to evaluate other attributes of the derivation.

If you repeat that shell command you should see it take a few seconds for you as well every time you run it. Also note, that if you remove the TESTATTR line, it does not change the output hash. This strengthens my belief, that the hash can be calculated with the outputHash and the name already.

Describe the solution you'd like For FODs, Nix should check if the resulting store path already exists before evaluating other attrs.

Describe alternatives you've considered None.

Additional context I have noticed that evaluating nixos tests take a lot of time. On my machine an average test takes a good 10 seconds to evaluate. I am using nixos tests to generate build artifacts but those dont change, once produce. Therefore I wanted to fixate them with a FOD. But I noticed, that I still had to pay the price for the evaluation, even though the result already existed.

Priorities

Add :+1: to issues you find important.

roberth commented 1 year ago

nixos tests

You may be interested in https://github.com/NixOS/nixpkgs/blob/974a26cc2be5e08fb5c3b9712c4fef76c6fcb9e1/flake.nix#L61-L72 if you have multiple test or multiple nodes that could share the same pkgs attrset. Or perhaps you call Nixpkgs from another part of your expression, in which case you could reuse that. Might still save a second.

Another suggestion is the runInLinuxVM which transforms a derivation by wrapping the "builder" in qemu. This is simpler and runs quicker. I would think that it also evaluates quicker.


Now as for your proposal, I see that it could work, but it would also degrade the repeatability of the FOD. Not running an FOD (when already realised) runs the risk that its builder breaks for external reasons, or even changes to dependencies. This is a risk we normally accept, but we do evaluate such dependencies, which gives us a small guarantee that at least the dependencies' expressions aren't broken. By implementing this optimization, we lose that guarantee. Normally, FODs are not particularly heavy on evaluation, consuming only a few dependencies such as curl and git, whose closures (values) are usually already needed for other things as well. So I don't think it's the right trade-off for typical fetchers, but perhaps an opt-in optimization is worth considering. We could recommend invalidateFetcherByDrvHash for testing such fetchers in CI.

nagy commented 1 year ago

First, thanks for your suggestions. I will try them out.

I would agree, that the FODs in nixpkgs are quite light and therefore quick to evaluate. But in out of tree use cases, for example in data forensics, I could see this ability to have nix build a file, that is byte to byte reproducible, of which you also want to preserve its hash in version control, to be very useful. There, the evaluations could take longer.

It seems like, the potential breakage, that you are worried about could be identified by something like a cronjob. If we introduce an option into nix.conf like deep-evaluate-fixed-output as a boolean, then hydra could be configured where this option is set to true and it could notify us of this variant of FOD breakage. This could enable this optimization and would still allow for one-off cli evaluations with --option deep-evaluate-fixed-output true as a cli flag.