Copy the outputs of ca-derivations with the outputId->outPath mapping

thufschmitt commented 4 years ago

Is your feature request related to a problem? Please describe.

We can currently copy the output of a derivation with nix copy /nix/store/fooobar --to somewhereelse or even directly nix copy nixpkgs#hello. However, if nixpkgs#hello is a CA derivation, what we don't copy is the mapping between the tuple (derivation, outputName) and the output path, meaning that if we try to rebuild the same derivation on the remote store, it will indeed be rebuilt rather than taking the copied value.

Describe the solution you'd like

We should allow commands like nix copy to not just take store paths as input, but also derivation outputs, and have the mappings be registered on the remote store.

Implementation wise, I think this would entail

Implement linkDeriverToPath (the function creating the mapping from derivation output to output path) for other stores than just LocalStore This should be rather simple for the remote store (essentially a matter of adding the method to the worker protocol). For binary cache stores it should also be rather simple, except that we'll have to carefully choose how we store this mapping as that choice will be “sticky” (because we can't really migrate binary caches if we want to change it later).
Make nix copy act on Buildables (a variant containing either a plain store path or a derivation with a set of outputs) rather than StorePaths Apart from the grunt work of replacing StorePath by Buildable in a bunch of places and calling linkDeriverToPath after the copy, this might warrant an sql schema change to be efficient, because one of the requirements for this copy to work properly is that we want to compute the runtime closure of a derivation output (meaning the set of derivation outputs that produce the runtime closure of the corresponding output path), and afaik there's no easy or efficient way to do that atm.

Additional context

Related to #4087

thufschmitt commented 4 years ago

@edolstra @Ericson2314 I've written a small doc explaining some db schema changes required for this. Can you have a look and tell me whether it makes sense to you?

Ericson2314 commented 4 years ago

Gladly! I already proposed yesterday to Eelco that we should have another PR meeting soon for both of ours. I think it would be great to talk about this stuff in real time, too.

Generally I like it, but I think we need to be careful to distinguish between resolved and unresolved derivations more. For example:

hello!out depends on libhello!out

But what we right down today is resolved(hello)!out, and that shouldn't depend on libhello!out or even resolved(libhello)!out because multiple resolved derivations could produce the same outPath(libhello!out).

Mapping outputs to resolved derivations, and mapping resolved derivations to unresolved derivation (trees) are both non-cannonical (multi-output), and that's the essence of the problem.

Here's a rough attempt

We have a (resolved drv, output name) -> output already, let's call it B.
Now let's add an drv -> resolved drv table. This caches a computation which is quite complicated (and possibly expensive too):
1. Collect resolved derivations that could "unresolve" to the given derivation, calling it C_n
2. Repeat that process for the unresolved derivation's drv inputs (C_n+1), but restriction that the resolutions agree: C_n+1 . B . inputSources . C. Note also that the inputDrvs and InputSrcs must agree; I do not have the formalism for this yet.
3. Keep repeating until there is no candidate or one for the entire drv (modulo fixed outputs, which could have not been built) is found.
That's really complex, but we can fill this map when we resolve which is easier.
Edge Transport, given a drv -> resolved drv mapping, and a (non-fixed output) input drv, we should be able to come up with the corresponding resolved drv and output, such that everything agrees (diagram needed!) I think this is the essence of the induction in the previous step.

(unresolved drv, input drv, resolved drv) -> input resolved drv which in turn be joined with B to get the input src that must be in the (non-input) resolved drv.
I'm not sure how, but I think with that last step everything is relational algebra rather than actually comparing concrete derivations to see if they are a potential resolution/unresolution match, so we can express it in SQL.

thufschmitt commented 3 years ago

Fixed by https://github.com/NixOS/nix/pull/4487

NixOS / nix

Copy the outputs of ca-derivations with the outputId->outPath mapping #4142