NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
11.5k stars 1.44k forks source link

Floating content-addressed derivations #4087

Open Ericson2314 opened 3 years ago

Ericson2314 commented 3 years ago

This is for the stuff discussed in NixOS/rfcs#62. I believe there is enough consensus that we can continue doing this as an unstable feature, and return to the RFC when we want to stabilize it.

Perhaps should split incomplete ones to other issues

CC @regnat @edolstra

xaverdh commented 3 years ago

you have a typo in the pr ref of the second box (should be 3958 not 3985).. really cool stuff by the way = )

Ericson2314 commented 3 years ago

Thanks and thanks! :)

divanorama commented 3 years ago

I suppose this is the implementation of https://github.com/NixOS/rfcs/pull/62

knedlsepp commented 3 years ago

I suppose this is the implementation of NixOS/rfcs#62

I'm also wondering what "floating CA derivations" actually are. I've watched a nix Friday about this and couldn't find any explanations about it. Also not in the RFC.

divanorama commented 3 years ago

Also coming from nixcon talk :) It seems that fixed-output/ca already exist for fetch* commands, then there's full switch to ca discussed in the original thesis as "intensional model" https://github.com/NixOS/rfcs/pull/17/, and floating ca https://github.com/NixOS/rfcs/pull/62 should be something in-between or combing both, but I definitely need to read more first...

Ericson2314 commented 3 years ago
knedlsepp commented 3 years ago

@Ericson2314 are there some resources where one can read up on what makes derivation "floating"?

Ericson2314 commented 3 years ago

@knedlsepp "floating" I just picked to mean "non-fixed" --- i.e. you get out a content-addressed path but didn't write down a hash in the derivation beforehand.

Besides the RFC and my talk, you can read about the "intensional store" in Eelco's thesis https://nixos.org/~eelco/pubs/phd-thesis.pdf. I do hope to write a new manual section on the data model Nix soon, though.

wmertens commented 3 years ago

@Ericson2314 I'm still working on RFC 17. The gist of it is that decoupling the store and the metadata seems like a very good idea, and this can be achieved if everything is CA.

I noticed that in 62 it's proposed that not all derivations be CA. However, isn't the CA checksum a property of anything, regardless of reproducibility of the build?

If you build $out twice and get two different $cas, does that matter, as long as you try to use only one $cas mapping for your subsequent builds?

7c6f434c commented 3 years ago

If you build $out twice and get two different $cas, does that matter, as long as you try to use only one $cas mapping for your subsequent builds?

Yes it definitely does. It does matter for --repair, and it also matters if you build something non-reproducible before Hdra does, but then want to fetch a rev-dep that Hydra builds before you get around to it.

Ericson2314 commented 3 years ago

Actually I don't think it matters much more. We could have intentionally impure derivations, even, whose CA mappings would expire, for example. For ones which are intended to pure yet not deterministic, I think it's bad for --repair either way. For example, repairing an impure input addressed derivation (changing it's content-address, not restoring the disk to match original CA) and then not rebuilding all the stuff that depends on it sounds like an unsafe optimization to me.

The only reason, to me, that #62 allows mixing and matching is for a smoother migration, e.g. mixing old and new nix code without changing hashes. There should be no benefit to using input-addressed derivations intentionally.

7c6f434c commented 3 years ago

Well, it is an unsafe optimisation, but also is our reference scanning logic.

I have seen it indeed break some things under very specific conditions, but on the other hands in these cases one could construct a situation where copying a closure breaks things, too…

Ericson2314 commented 3 years ago

Good points. I suppose we can also just rewrite downstream derivations using the reference scanning logic, to still have O(1) repair too. This is perfectly fine because the trust map can still map the old derivation to the old outputs. A different derivation (possibly built in) can account for the rewriting in a new trust map entry, if we want to have an entry at all.

wmertens commented 3 years ago

But then the CA checksum won't match any more and the self-validation is out the window

7c6f434c commented 3 years ago

Good points. I suppose we can also just rewrite downstream derivations using the reference scanning logic. This is perfectly fine because the trust map can still map the old derivation to the old outputs.

This solution has an anoying drawback: the reference scanning logic is robust enough for its current task, finding at least one copy of each reference (there are corner cases, and trivial workarounds are applied in them), but it is known not to be robust enough to find all references (including, e.g., inside compressed manpages).

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

MagicRB commented 2 years ago

Still interested