What are your thoughts on ca-derivations?

lucasew commented 1 month ago

Question

What are your thoughts on content addressed derivations? Could them eventually be stable to be enabled by default? Would they really reduce the storage and cache usage slope in a significative value?

Candidates I'd like to get an answer from

No response

Reminder of the Q&A rules

Please adhere to the Q&A guidelines and rules

mschwaig commented 1 month ago

What are your thoughts on content addressed derivations?

I am a researcher who works on supply chain security, and I see content addressed derivations through that lens.

The practical goal of my recent research is to propose some ways to extend Nix, which make build outputs more trustworthy.

Input addressed derivations show up as one problem in this context, because if you consume an input addressed derivation from a cache, you need to trust the builder of that derivation with how it did dependency resolution. This makes it so that you have to trust the builder to a larger extent than necessary. It would be really difficult to give full context and a proper explanation of this issue here, but you can read about it in my paper, the contents of which will also be presented in this talk at NixCon.

Switching to content addressed derivations is not the only way to fix this particular problem, we could also retrofit something onto input addressed derivations. And we would have to solve a few other things as well to see any benefits from this one.

In any case this needs to be solved on the level of the signatures (or alternatives like Trustix), which we use to communicate trust relationships. I think there is still some work to do to ensure that the signatures used with content addressed derivations are as solid as I would like:

We have to have to look at what relationships a signed derivation and signed realization actually represent, and
make sure you only place trust in the correct parts of that.
If you do any rewriting of outputs, which is not implemented, but was proposed in @edolstra's thesis, we would also have to make sure the rewriting is fully accounted for in terms of trust.

Could them eventually be stable to be enabled by default?

I would like us to get there, and I think if we have an implementation that is really solid in terms of reliability, there will come a point in time where it should be the default. Besides wanting the implementation to be suitable reliable, I would also want the first stable version of this feature to be implemented in a way where we no longer place trust in the builder for dependency resolution.

Would they really reduce the storage and cache usage slope in a significative value?

I think this questions needs to be answered based on measurements, otherwise the actual impact is really difficult to judge. If and how rewriting is done would also play a significant role there for sure. There are a lot of other benefits to content addressed derivations, we just have to be confident that we got the design right.

getchoo commented 1 month ago

What are your thoughts on content addressed derivations?

I think they are a really interesting path for Nix to go down. The idea of a content addressed store has seen a lot of success in systems such as OSTree and of course Git, and opens up some more possibilities in trusting build machines as described above by mschwaig.

Could them eventually be stable to be enabled by default?

With enough work and testing, I believe so.

Would they really reduce the storage and cache usage slope in a significative value?

I believe this comment I came across a while ago is a good explainer. As a TL;DR: being able to change the inputs of a build in a way that doesn't change the output would no longer cause cascading rebuilds, which is good. I'm not sure how significant it would be though as if any of the output changes, we still have the cascading rebuild problem.

nyabinary commented 1 month ago

I believe content-addressed derivations would significantly improve the storage and cache usage situation in Nix by optimizing how derivations are handled. Since ca-derivations rely on content rather than paths, they reduce duplication and make caching more efficient. This would allow for better reuse of existing derivations, thereby reducing the storage footprint and improving performance in large-scale environments. So yes, I do believe they would have a significant impact on storage and cache usage in a pretty significant way too. :3

asymmetric commented 1 month ago

As a candidate, I don't have a particularly informed technical opinion on this. I also don't think the Steering Committee should directly intervene on specific technical decisions -- it should instead defer and delegate to domain experts who have the knowledge and authority to make such choices.

What I do think is important is that we come to an agreement over what features we want to prioritize, and then actually prioritize them. I find it frustrating that in our ecosystem "experiments" tend to drag along for so long, with no clear path to stabilization.

So in this sense, my only strong opinion on CA derivations is that we should figure out where they sit on our official list of priorities, and allocate resources accordingly.

roberth commented 1 month ago

ca-derivations would be beneficial to resource use, and it is always good to complete a feature.

However, as a member of the Nix team, I have noticed little interest in actually completing this feature, whether that's me, the other team members or the community, though I may well be wrong about the latter, if they are actually interested in contributing to ca-derivations.

The progress of this feature is somewhat unclear to me, because the ca-derivations milestone is a recent addition to the issue tracker (not while the feature was initially developed), and it is still very very incomplete. A good first step would be to curate the issues. The Nix team could use your help with that; contact us if you're interested in this.

proofconstruction commented 1 month ago

I have many thoughts! We already pass everything through hash functions so ca-derivations is an obvious next-step in Nix's future. To be a bit suggestive, in principle we could precompute the hash of all valid Nix files of a certain family (e.g. valid configurations for a given NixOS module) to know in advance what derivation a hash refers to and what sequence of bits is produced by its realization. I won't say more yet as this is not the right venue for this discussion, but this is closely related to other ideas that motivate my discussion in #50 of the need to revisit Nix from first principles.

NixOS / SC-election-2024