NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.86k stars 1.52k forks source link

Make scheduler work with shallow realisations without fetching unneeded store objects #11928

Open Ericson2314 opened 1 day ago

Ericson2314 commented 1 day ago

Shallow realizations map a basic drv (no inputDrvs) and output name to a content address.

Suppose we have a dependency graph like CompilerA -> CompilerB -> Library. These are only build-time dependencies: the outputs of each build will not depend on this dependency. For sake of argument, CompilerA is "plain old data", (like a bootstrap binary), and just uploaded as-is.

Suppose we have built all 2 derivations and uploaded the results, shallow realisations, but not deep realisations to a remote store. Now, in another store, configured to substitute from that remote store, one tries to build Library.

Currently, this will happen:

  1. Want to obtain Library
  2. There is no deep realization in the cache keyed unresolved derivation
  3. We don't know any content-addressed store object we try to download.
  4. Wants to build Library
  5. Want to obtain CompilerB
  6. Finds shallow trace for CompilerB (since CompilerA is plain old data, CompilerB's derivation is already resolved)
  7. Downloads CompilerB
  8. Resolves Library derivation
  9. Finds shallow trace for Library Derivation
  10. Downloads Library

This works, but note that we downloaded CompilerB even though it is not in the runtime closure of Library.

Instead I would want something like this:

  1. Want to obtain Library
  2. There is no deep realization in the cache keyed unresolved derivation
  3. Wants to resolve Library derivation
  4. Wants resolution for CompilerB
  5. Finds shallow trace for CompilerB
  6. Resolves Library derivation
  7. Finds shallow trace for Library Derivation
  8. Downloads Library

Now we don't bother downloading CompilerB.

The way to make the second sequence of steps reality is to have "obtaining a realisation" a goal in and of itself, separate from obtaining a store object and building one. In the case where the cache doesn't have the realisation, it falls back on to just building it, but in the case where it does it doesn't need to fall back on downloading store objects. Dependencies between these goals would allow us to resolve derivations through arbitrary many inputDrv edges without downloading any store objects.


Before doing this, we should attempt https://github.com/NixOS/nix/issues/11927 so this code is not nearly as annoying to work with.