NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.3k stars 1.49k forks source link

Tracking issue for RFC 92: Dynamic derivations #6316

Open Ericson2314 opened 2 years ago

Ericson2314 commented 2 years ago

Info

Steps

Here are the PRs to review:

Preparatory work

  1. 4543

    • 6815

  2. 7543

  3. 7600

  4. 3746

  5. 3959

    • Maybe also #7339
  6. 7601

  7. 8724

  8. 8927

  9. 8938

Actual implementation

  1. 8353

  2. 8369

  3. 8813

  4. 4628

  5. https://github.com/NixOS/nix/pull/9415

Quality of life / Nice to have

CC @tomberek

Ericson2314 commented 2 years ago

FYI, the middle 2 PRs might be better to review together, as the second one revises the StorePathDescriptor the first one creates.

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/rfc-92-status-update/27441/1

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/zurich-23-05-zhf-hackathon-and-workshop-report/29093/1

tomberek commented 1 year ago

With https://github.com/NixOS/nix/pull/8369 the largest pieces of internal re-configuring are done. This means the internal representations are ready to accommodate the user-facing changes of a new primop: https://github.com/NixOS/nix/pull/8813

Stay tuned for more exciting chagnes!

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-08-25-nix-team-meeting-minutes-82/32283/1

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixcon-governance-workshop/32705/9

roberth commented 8 months ago

If a dynamic derivation A performs nix derivation add B, does that create a "deriver" relation between two .drv store paths? Technically it could, but I don't think it should, because

So while we could consider such deriver rows to be valid, adding them consistently is probably more trouble than it's worth.

We should still have a deriver row for A and B, and I believe that's enough for querying any relevant paths through the combined deriver and reference graphs.

TODO

physics-enthusiast commented 6 months ago

Does closureInfo still get every single derivation that is involved in a build with this?

roberth commented 6 months ago

@physics-enthusiast

Consider the following derivations:

I wouldn't expect it to list B to list A, because A.out need not have a reference to A. Note also that exportReferenceGraph works on the references relation, which for "normal" inputs only contains outputs, maybe some constant paths (such as "${./script.sh}"), but rarely actual .drv files. The deriver relation is impure and is not included by exportReferenceGraph.

So, probably not, but this should be checked and documented.

physics-enthusiast commented 6 months ago

I wouldn't expect it to list B to list A, because A.out need not have a reference to A.

Could a case not be made for including a reference to A.drv in A.out? Any derivation referencing A.out must first have a reference to A, and from a dependency tracking perspective A must build for A.out to exist (and hence build) so I think arguably A is still a build-time dependency. This is especially important if you are trying to do something like backing up all FODs needed to build a particular package, since the content of A.out might depend on the result of some FOD. The exportReferencesGraph resolver could stop at the dynamic derivations if the original query was not to a derivation (so that someone trying to get run-time dependencies doesn't accidentally pull the whole build-time closure) , since drvs have no run-time dependencies.

physics-enthusiast commented 6 months ago

Actually, having thought about it further, maybe dynamic derivations should not be allowed to be fixed-output at all. Even with CA derivations, a FOD dynamically generated by a nondeterministic derivation is not guarenteed to be reproducible from source. You could for example code in several possible URLs and their respective hashes, and then have A pick one at random to generate the A.out FOD, which could in turn produce completely different outputs. Content addressing preserves reproducibility within a single Trust DB (so between rebuilds within a system, and between systems sharing a binary cache), but someone independently building the exact same expression could in theory end up with an entirely different realisation.

Dessix commented 6 months ago

@physics-enthusiast Isn't that why FODs require a hash as one of their parameters, to verify that they were reproduced?

physics-enthusiast commented 6 months ago

@Dessix the issue is that if the FOD is built dynamically (by another derivation), that hash can also be altered dynamically (i.e. during build-time rather than eval-time), breaking reproducibility. Come to think of it, setting hashes via IFD seem to have this problem too. Edit: wait, is that what all those "*2nix" libraries using IFDs are doing? Edit 2: turns out flakes block IFD, so at least in pure eval IFD->FOD shouldn't be a problem

roberth commented 6 months ago

Could a case not be made for including a reference to A.drv in A.out?

A could choose to add a true reference, but that would cause unnecessary rebuilds of A.out.

I suppose something could be done for exportReferencesGraph by basing the result on the DerivingPath rather than the realised, opaque StorePath, ie the whole /nix/store/foo.drv^bar^baz, and not just /nix/store/qux-baz. That way it can return info about foo.drv^bar. This should probably be the behavior of a new attribute though, because existing code based on exportReferencesGraph will expect to only see the references of the final realised path.

FOD

As mentioned by @Dessix, FODs don't introduce any new build impurities.

Dynamic derivations do turn actual build impurities into "instantiation impurities", but this is a necessary evil if we want to have this kind of dynamism. Anything that writes derivations should be written with care, whether that's our own evaluator or a dynamic derivation deriver builder. (Do we have a term for the derivation that produces another derivation? Dynamic derivation seems to refer to the output, not the derivation writer if that's what we want to call it.)

Come to think of it, setting hashes via IFD seem to have this problem too.

IFD is a somewhat more benign source of instantiation impurities, because at least you get the opportunity to do some of the processing in the Nix language, which has reproducibility as a design goal, especially with pure mode.

Edit 2: turns out flakes block IFD

Generally IFD is allowed, but some of the metadata commands forbid it, under the assumption that they'd be evaluated and indexed centrally, and those commands should obey the same restrictions. However, that's not what flakestry.dev or flakehub do, as they just accept the readily evaluated JSON from CI. I know of at least one flake that disables the IFD restriction before uploading to a registry.

Ericson2314 commented 6 months ago

@roberth Is this different from the question of whether CA derivations (which includes derivation-producing derivations) ought to use the deriver field for their outputs?

roberth commented 6 months ago

@Ericson2314 a lot has been discussed, so I can't tell what this refers to. Also not sure what is the exact use of the deriver field in that context.

I can see a vague parallel with the tracking of build input hashes (transitively) for any build, and CA in particular.

Also relevant is the idea of those "large step realisation" objects #8947, which would become somewhat useless in cases where the discussed exportReferencesGraph extension is used.

Coming to think of it, exportReferencesGraph could use a fresh design that takes both RFC 92 derivers, as well as derivation closures into account. IIRC diffing closures to get an accurate set of build dependencies (to allow offline builds from one closure to the next) is also still an unsolved problem, and its potential solution is probably also impacted by RFC 92.

physics-enthusiast commented 6 months ago

@roberth The possibility of offline rebuilds without having to copy over the entire store is actually a large part of the reason I raised these queries in the first place. At least in theory it should be possible to gather a binary cache of only the FODs in the build-time closure of a package and rebuild that package with it. My concern was that one of the things instantiation-time impurities can do that build-time impurities cannot is introduce additional external dependencies. This means that even if you could get every FOD that was used in the build (which RFC 92 would also prevent us from doing without the exportReferencesGraph extension you mentioned), it would still be possible for an offline rebuild to fail due to one of the FOD hashes being changed (or an entirely new FOD generated) by said impurity. To be fair, right now the only way I can think of for this to happen is if a dependency was intentionally misbehaving (since all of the possible hashes would still need to be specified manually), but the fact that it's even possible is still at least somewhat concerning for me. Maybe the ability to generate an "instantiation-time closure" of sorts could be one of the solutions to this?