Open Ericson2314 opened 2 years ago
FYI, the middle 2 PRs might be better to review together, as the second one revises the StorePathDescriptor
the first one creates.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/zurich-23-05-zhf-hackathon-and-workshop-report/29093/1
With https://github.com/NixOS/nix/pull/8369 the largest pieces of internal re-configuring are done. This means the internal representations are ready to accommodate the user-facing changes of a new primop: https://github.com/NixOS/nix/pull/8813
Stay tuned for more exciting chagnes!
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2023-08-25-nix-team-meeting-minutes-82/32283/1
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/nixcon-governance-workshop/32705/9
If a dynamic derivation A performs nix derivation add B
, does that create a "deriver" relation between two .drv
store paths?
Technically it could, but I don't think it should, because
inputDrvs
into B, this derivation is not derived by A, so these deriver entries do not follow the closure property. That may be ok, but does not help with storing the deriver relation more efficiently, and scores no extra points.So while we could consider such deriver rows to be valid, adding them consistently is probably more trouble than it's worth.
We should still have a deriver row for A and B, and I believe that's enough for querying any relevant paths through the combined deriver and reference graphs.
TODO
nix derivation add
from within a build does not create a deriver row.Does closureInfo
still get every single derivation that is involved in a build with this?
@physics-enthusiast
Consider the following derivations:
A
A.out
- a dynamic derivationB
, which has A.out
as an input, and uses exportReferencesGraph
, like closureInfo
doesI wouldn't expect it to list B
to list A
, because A.out
need not have a reference to A
.
Note also that exportReferenceGraph
works on the references relation, which for "normal" inputs only contains outputs, maybe some constant paths (such as "${./script.sh}"
), but rarely actual .drv
files.
The deriver relation is impure and is not included by exportReferenceGraph
.
So, probably not, but this should be checked and documented.
I wouldn't expect it to list
B
to listA
, becauseA.out
need not have a reference toA
.
Could a case not be made for including a reference to A.drv
in A.out
? Any derivation referencing A.out
must first have a reference to A
, and from a dependency tracking perspective A
must build for A.out
to exist (and hence build) so I think arguably A
is still a build-time dependency. This is especially important if you are trying to do something like backing up all FODs needed to build a particular package, since the content of A.out
might depend on the result of some FOD. The exportReferencesGraph
resolver could stop at the dynamic derivations if the original query was not to a derivation (so that someone trying to get run-time dependencies doesn't accidentally pull the whole build-time closure) , since drv
s have no run-time dependencies.
Actually, having thought about it further, maybe dynamic derivations should not be allowed to be fixed-output at all. Even with CA derivations, a FOD dynamically generated by a nondeterministic derivation is not guarenteed to be reproducible from source. You could for example code in several possible URLs and their respective hashes, and then have A
pick one at random to generate the A.out
FOD, which could in turn produce completely different outputs. Content addressing preserves reproducibility within a single Trust DB (so between rebuilds within a system, and between systems sharing a binary cache), but someone independently building the exact same expression could in theory end up with an entirely different realisation.
@physics-enthusiast Isn't that why FODs require a hash as one of their parameters, to verify that they were reproduced?
@Dessix the issue is that if the FOD is built dynamically (by another derivation), that hash can also be altered dynamically (i.e. during build-time rather than eval-time), breaking reproducibility. Come to think of it, setting hashes via IFD seem to have this problem too. Edit: wait, is that what all those "*2nix" libraries using IFDs are doing? Edit 2: turns out flakes block IFD, so at least in pure eval IFD->FOD shouldn't be a problem
Could a case not be made for including a reference to
A.drv
inA.out
?
A
could choose to add a true reference, but that would cause unnecessary rebuilds of A.out
.
I suppose something could be done for exportReferencesGraph
by basing the result on the DerivingPath
rather than the realised, opaque StorePath
, ie the whole /nix/store/foo.drv^bar^baz
, and not just /nix/store/qux-baz
.
That way it can return info about foo.drv^bar
.
This should probably be the behavior of a new attribute though, because existing code based on exportReferencesGraph
will expect to only see the references of the final realised path.
FOD
As mentioned by @Dessix, FODs don't introduce any new build impurities.
Dynamic derivations do turn actual build impurities into "instantiation impurities", but this is a necessary evil if we want to have this kind of dynamism. Anything that writes derivations should be written with care, whether that's our own evaluator or a dynamic derivation deriver builder. (Do we have a term for the derivation that produces another derivation? Dynamic derivation seems to refer to the output, not the derivation writer if that's what we want to call it.)
Come to think of it, setting hashes via IFD seem to have this problem too.
IFD is a somewhat more benign source of instantiation impurities, because at least you get the opportunity to do some of the processing in the Nix language, which has reproducibility as a design goal, especially with pure mode.
Edit 2: turns out flakes block IFD
Generally IFD is allowed, but some of the metadata commands forbid it, under the assumption that they'd be evaluated and indexed centrally, and those commands should obey the same restrictions. However, that's not what flakestry.dev or flakehub do, as they just accept the readily evaluated JSON from CI. I know of at least one flake that disables the IFD restriction before uploading to a registry.
@roberth Is this different from the question of whether CA derivations (which includes derivation-producing derivations) ought to use the deriver field for their outputs?
@Ericson2314 a lot has been discussed, so I can't tell what this refers to. Also not sure what is the exact use of the deriver field in that context.
I can see a vague parallel with the tracking of build input hashes (transitively) for any build, and CA in particular.
Also relevant is the idea of those "large step realisation" objects #8947, which would become somewhat useless in cases where the discussed exportReferencesGraph
extension is used.
Coming to think of it, exportReferencesGraph
could use a fresh design that takes both RFC 92 derivers, as well as derivation closures into account.
IIRC diffing closures to get an accurate set of build dependencies (to allow offline builds from one closure to the next) is also still an unsolved problem, and its potential solution is probably also impacted by RFC 92.
@roberth The possibility of offline rebuilds without having to copy over the entire store is actually a large part of the reason I raised these queries in the first place. At least in theory it should be possible to gather a binary cache of only the FODs in the build-time closure of a package and rebuild that package with it. My concern was that one of the things instantiation-time impurities can do that build-time impurities cannot is introduce additional external dependencies. This means that even if you could get every FOD that was used in the build (which RFC 92 would also prevent us from doing without the exportReferencesGraph
extension you mentioned), it would still be possible for an offline rebuild to fail due to one of the FOD hashes being changed (or an entirely new FOD generated) by said impurity. To be fair, right now the only way I can think of for this to happen is if a dependency was intentionally misbehaving (since all of the possible hashes would still need to be specified manually), but the fact that it's even possible is still at least somewhat concerning for me. Maybe the ability to generate an "instantiation-time closure" of sorts could be one of the solutions to this?
Info
Steps
Here are the PRs to review:
Preparatory work
4543
6815
7543
7600
3746
3959
7601
8724
8927
8938
Actual implementation
8353
8369
8813
4628
Quality of life / Nice to have
fetchClosure
CC @tomberek