re-fetch source when url changes

NixOS / nix

Nix, the purely functional package manager

https://nixos.org/

GNU Lesser General Public License v2.1

12.77k stars 1.52k forks source link

re-fetch source when url changes #969

Open magnetophon opened 8 years ago

magnetophon commented 8 years ago

Currently the source is only re-fetched when the hash changes. When you use nix-env -i to find the new hash, but forget to make a change to the hash, you are stuck with the wrong source.

To add injure to insult: the source is then not re-fetched when you change the hash: you first have to garbage-collect the old source.

domenkozar commented 8 years ago

In fixed-output derivations, the hash is content addressed. We could get URL to be part of the hash, but then mirror/url change would trigger a rebuild, which is very unfortunate.

I propose you change the workflow how you change sources, but I realize that's poor UX.

copumpkin commented 8 years ago

I want to better understand the "stuck with the wrong source" thing you mention because I haven't experienced it, but otherwise I'm basically with @domenkozar on this. I don't like the usability aspects either, but I don't know how to retain the nice content-centric properties without something like this.

One option, possibly more work than someone's willing to put into solving this, is for Nix to maintain a local cache of fixed-output derivation output hashes and the regular-derivation-style input hash that led to that output hash. This could be stored as a very simple "output-hash -> regular-derivation-hash" KV mapping. Then it could simply warn you that it's seeing the same output hash coming from a different derivation, which isn't necessarily an error but might be interesting to the user.

magnetophon commented 8 years ago

@domenkozar: Could you expand on "I propose you change the workflow how you change sources, but I realize that's poor UX."?

@copumpkin: What I mean is: nix wil write the old source in the store, under the new name.

It won't re-fetch it if you just change the hash; you have to remove the "old source under the new name" first.

@ the both of you: it seems this issue is not as easy to fix as I hoped. Feel free to close it if you want.

copumpkin commented 8 years ago

What I mean is: nix wil write the old source in the store, under the new name. It won't re-fetch it if you just change the hash; you have to remove the "old source under the new name" first.

I'm confused. The way Nix thinks about fixed-output derivations is as follows:

The content + the derivation name (which in some cases isn't provided by the user) are the only things that identify the data in the store.
The other stuff like the URL for fetchurl or the git repo specs for fetchgit, and basically all the scripts that call git or curl or whatever are "hints" for how to produce the content in (1). If the content already exists, the scripts won't be run. If the content doesn't exist, and the script produces data with a different hash, Nix will complain that the hash doesn't match and quit.

So here are the scenarios I can think of:

Content not in store, and you change the hash without changing e.g., the url for fetchurl: Nix will fetch the original URL, check it against the new hash you changed, and complain that the hashes don't match.
Content not in store, and you leave the original hash alone: Nix will fetch the original URL, check against the correct hash, and add the result to the store because they match.
Content in store, and you change the hash without changing the URL: Nix will fetch the original URL, check it against the new hash, and complain that the hashes don't match.
Content in store, and you change the hash and change the URL: Nix will fetch new URL, check against new hash, and accept the new content.

It sounds like you're saying (3) is leaving junk in the store that's identified with the new hash but contains the old data, but that doesn't fit with my understanding of how things work. What am I missing?

magnetophon commented 8 years ago

@copumpkin: I'm talking about when you change the version (or name in general) of the pkg and the url, but not the hash. afaik in that case it doesn't matter if the content is already in the store, it'll be stored under the new name.

Sorry for the confusion, hope I'm clear now.

veprbl commented 8 years ago

I see the opposite in my tests:

Welcome to Nix version 1.11.2. Type :? for help.

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.fetchurl { url = "http://example.com/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }
«derivation /blah/nix/store/9hj80g8r12q74wk6k0wq8zfi1bf6glgn-test.tar.gz.drv»

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.fetchurl { url = "http://example.org/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }
«derivation /blah/nix/store/6zh44fbv7agc7n7il0wprzg6hv1khwha-test.tar.gz.drv»

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.stdenv.mkDerivation { name = "test"; src = pkgs.fetchurl { url = "http://example.com/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }; }
«derivation /blah/nix/store/zbajq1zcsz90z8dnnfxfi5d1a36kk17l-test.drv»

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.stdenv.mkDerivation { name = "test"; src = pkgs.fetchurl { url = "http://example.org/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }; }
«derivation /blah/nix/store/dvhk6cw4am7r9rgyjb1q7rwqdww6cns7-test.drv»

I would think that nix shouldn't care about url change as long as the hash stays the same. But in reality the change of the url gives me both redownload and (massive) rebuild.

domenkozar commented 8 years ago

@veprbl we're talking about the realized derivation hash, not the derivation hash itself.

freeman42x commented 6 years ago

@magnetophon

I was getting:

nix-store --delete /nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16
error: cannot delete path ‘/nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16’ since it is still alive

I run:

nix-store -q --roots /nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16

And deleted the results folder mentioned by previous command.

After that nix-build picked up the hanged URL.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

hab25 commented 10 months ago

I marked this as stale due to inactivity. → More info

To remove the stale label, just leave a new comment.

tobiasBora commented 9 months ago

I was thinking, would it make sense to add an option like --force-redownload that would try to redownload the source of the currently built package, and double check if the hash corresponds? It could help people to quickly check if the hash in a derivation is correct. (EDIT: actually, I think the current behavior brings some security issues, let me test it first)

lolbinarycat commented 8 months ago

One option, possibly more work than someone's willing to put into solving this, is for Nix to maintain a local cache of fixed-output derivation output hashes and the regular-derivation-style input hash that led to that output hash.

doesn't nix already track this for garbage collection purposes? nix-store -q --deriver and nix-store -q --valid-derivers should give you enough information to triangulate this.

tobiasBora commented 6 months ago

I can confirm that this bug can lead to some security issues, described in an email sent to the security team (I'll likely create a public issue soonish). Other potential attacks (if other functionalities like cache sharing between Ofborg and reviewers/hydra is implemented) are described in https://github.com/NixOS/ofborg/issues/68#issuecomment-2083351493

What I propose to solve this is to keep on each nix installation a map URL -> hash (or rather, as suggested by Linus in a private email, a map derivation -> hash since fixed-output derivations might be much more general than downloading a file). For any fixed-output derivation, if hash is already present on the system, it should first check if the url/derivation is also contained in the database with the appropriate hash. If not, it should re-execute the derivation, check if the hash is correct, and if so continue from the cached value.

EDIT: See my proposition here https://github.com/NixOS/ofborg/issues/68#issuecomment-2085327052 for a more robust version

oxij commented 6 months ago

@tobiasBora

After @risicle pointed me your way in https://github.com/NixOS/rfcs/pull/171#issuecomment-2095639253, and after reading your comments in https://github.com/NixOS/ofborg/issues/68 I came up with a very simple currently undetectable way to poison Hydra's cache in https://github.com/NixOS/rfcs/pull/171#issuecomment-2096283938:

But now thinking about this, I have an IMHO much simpler and thus scarier attack that will work against Hydra:

Malice makes an evil PR with a package update that changes the source's revision to a new version but points the hash to a hash of an older version of the same package, a version which has a well-known CVE.

A very paranoid Nixpkgs maintainer Alice (who does not use any build caches) looks at it, it looks fine, so she builds it, and it works fine, since Alice has the older version of the source in /nix/store, and so she merges it.

All tools that track unfixed CVEs in Nixpkgs only look at package versions, so those will be fooled too.

Malice can now exploit the CVE.

So, this is actually a critical bug.

Merging https://github.com/NixOS/nixpkgs/pull/49862, setting nameSourcesPrettily = true or nameSourcesPrettily = "full" by default, and banning of customly-named source derivations is a workaround that will work in the meantime.

tobiasBora commented 6 months ago

Yes, I actually realized exactly this morning this issue with fetchzip and transmitted a MWE to the security team showing that one can use the code of any package controlled by an adversary to inject code into any another package, exploiting the fact that fetchzip removes the package name & version in the attribute name, making the attack trivial to carry: the difference between a malicious & honest derivation is juts one hash.

As a result, I submitted a CVE online.

Patching fetchzip as discussed in https://github.com/NixOS/rfcs/pull/171 is only a first step but does not stop the slightly more involved cache poisoning attacks. The only robust fix I can think of is what I am proposing in https://github.com/NixOS/ofborg/issues/68#issuecomment-2085327052

Merging NixOS/nixpkgs#49862, setting nameSourcesPrettily = true or nameSourcesPrettily = "full" by default, and banning of customly-named source derivations is a workaround that will work in the meantime.

That might be a first quick mitigation (actually not needed if we implement https://github.com/NixOS/ofborg/issues/68#issuecomment-2085327052), but not something sufficient. Nobody would realize if a derivation is misnamed, for instance in a long "automatically generating" file.

oxij commented 6 months ago

The only robust fix I can think of is what I am proposing in https://github.com/NixOS/ofborg/issues/68#issuecomment-2085327052

While your algorithm will work if you trust Hydra to do the right thing and check all fixed-output outputs, the problem is that ATM Hydra won't do it because Nix does not do it.

I.e. you are trying to prevent poisoning of your local /nix/store with an unrelated output substituted from a binary cache, but, as I pointed out above, the attacker can trivially poison your (and Hydra's) /nix/store directly too by reusing old hashes.

Merging https://github.com/NixOS/nixpkgs/pull/49862, setting nameSourcesPrettily = true or nameSourcesPrettily = "full" by default

is one work-around for this.

As hinted in https://github.com/NixOS/rfcs/pull/171#issuecomment-2082993487, making nix-store -r --check semantics slightly more convenient:

for a normal derivation, when the outPath is valid it should build the derivation again and check the result matches the previously known output exactly (as it does now),
for a normal derivations, when the outPath is not valid, it should just build the derivation twice and compare the results (the current semantics of failing in this case is surprising and inconvenient),
for a fixed-output, regardless of outPath validity, it should always do the build and then check the hash,

and then making OfBorg always run nix-store -r --check for all affected fixed-output derivations, thus limiting the set of attackers who could poison caches and /nix/stores to those who have direct commit access to Nixpkgs, is another work-around for this.

But the actual solution is to never reuse outputs between derivations without checking them first, i.e. Nix should remember a mapping of outPath -> set<drvPath> for each valid outPath in /nix/store (ATM Nix only stores one drvPath for each valid outPath, aka nix-store -q --deriver) and nix-store -r should force re-build (i.e. --check) each given derivation even if the corresponding outPath already exists and is valid in /nix/store in cases when that outPath does not have that drvPath in its set<drvPath> yet.

In other words, /nix/store path validity should be changed from being a flag ("is valid") to a predicate ("is valid as an output for this drvPath).

Similarly, substitutors should provide their set<drvPath> for each requested outPath to the client Nix'es (somehow) and a client Nix should check those before accepting a substitution.

Obviously, without any optimizations that would be incredibly inconvenient, as, among other things, not only changes to URLs and changes between fetch* functions would always cause a rebuild, but also any unrelated changes to stdenv.mkDerivation would cause all derivations using any of the fetch* functions to be rebuilt too.

One possible way to minimize rebuilds in this case is to implement a tiny mkDerivationLite variant of stdenv.mkDerivationLite and use that in all fetch* functions. Sure, updating curl or git will still cause a lot of rebuilds, but for the paranoid this could be a good thing: after all, what if the attacker poisoned one of those somehow.

Another way to reduce rebuilds with drvPath-predicated-validity is to add a new outputPathToken parameter to derivation which would declare to Nix that the outPath of this derivation is valid for any other derivation that maps to the same outPath and uses the same outputPathToken, i.e. it would declare an equivalence class of valid drvPaths for each outPath (i.e., in the above, set<drvPath> becomes set<drvPath or outputPathToken>).

Then,

the current minimal-rebuilds behavior can be archived by always setting outputPathToken to some constant,
keeping outputPathToken always unset will get you the most paranoid behavior at the cost of most rebuilds,
precise control of what should or should not be rebuilt on changes to fixed-output derivations can be archived by setting outputPathToken to different things in different fetch* derivations, e.g.
- set outputPathToken to the "fetchurl " + concatSepBy " " (sort urls) in fetchurl so that changes to URLs would cause rebuilds,
- set it to "fetchgit " + url + " rev " + rev in fetchgit so that changes to URLs and revisions would also cause rebuilds,
- etc.

nix.conf could then provide an option to ignore all outputPathToken options, for the most paranoid.

tobiasBora commented 6 months ago

I've to admit I don't yet fully grasp all your proposition, but here are a few comments after a first reading:

While your algorithm will work if you trust Hydra to do the right thing and check all fixed-output outputs, the problem is that ATM Hydra won't do it because Nix does not do it.

Well, I precisely want nix to implement it. All other solutions would either not work or require the user to do massive downloads (if the data don't exist, you have no other way than regenerating it yourself) to check all FOD, which can lead to many issues. Not only the user would need to download the (transitive) sources of all packages they want to install, even if they only care about the final binary, by if the source is not available anymore at its current location then the user would not even be able to install the binary (and the security team mentioned me that they don't want to go this path as it used to be this way and created many issues).

I.e. you are trying to prevent poisoning of your local /nix/store with an unrelated output substituted from a binary cache, but, as I pointed out above, the attacker can trivially poison your (and Hydra's) /nix/store directly too by reusing old hashes.

If one changes nix the way I propose, I don't see how anyone could poison Hydra's cache (of course if we trust Hydra itself… but one anyway already needs to trust Hydra for many other reasons). The new derivation -> hash map is precisely used to say "Hydra checked that this FOD derivation evaluates to hash", i.e. "no poisoning has been done on this derivation". In particular, old hashes would NOT be part of this trusted map until Hydra checks that they are indeed correct.

setting nameSourcesPrettily = true or nameSourcesPrettily = "full" by default is one work-around for this.

I don't see it as a true work-around. This makes a very particular attack (downgrading) harder, but there are many other ways to pollute a cache, for instance by adding (e.g. a few months in advance to be sure it is in Hydra' cache) a malicious dependency in some automatically generated files with the name & version of the program to attack.

Nix should remember a mapping of outPath -> set<drvPath> for each valid outPath in /nix/store

I think this is more or less what I proposed, just the maps is in the other way around as I was proposing instead drvPath -> outPath & I also propose to let cache.nixos.org/… share this map.

One possible way to minimize rebuilds […]

What do you call a rebuild here? Is it in the sense that:

the FOD output should be re-checked by Hydra, to complete the map drvPath -> outPath (which indeed should be done by hydra if the fetch* function change, and I guess having a lite version of mkDerivation makes sense if these often change)
the FOD output should be re-checked by the end user: this should NOT be the case, at least if hydra already did it before and if we implement the sharing of drvPath -> outPath I was proposing above
the whole world should be rebuilt: this should NOT be the case, once the user is assured that the hash is correct they can use the old derivation as before.

oxij commented 6 months ago

(of course if we trust Hydra itself… but one anyway already needs to trust Hydra for many other reasons)

What reasons? If I'm not using the Hydra binary cache and I'm not using the default bootstrap tarballs what is there to trust? Tarballs downloaded from Hydra tarball cache get hashed and checked against hashes in their derivations locally, as they should.

I don't see it as a true work-around. This makes a very particular attack (downgrading) harder, but there are many other ways to pollute a cache, for instance by adding (e.g. a few months in advance to be sure it is in Hydra' cache) a malicious dependency in some automatically generated files with the name & version of the program to attack.

Putting the whole revision or hash into the source derivation name and banning of customly-named fetch* derivations will make that attack impossible.

But even without the first part (and with <hash>-<name>-<version or short revision>-source scheme), sure, in theory, the attacker could still poison it by pre-fetch*ing an evil source in an autogenerated files with the same derivation name if the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance.

So... almost never.

And then, users like me, that do not build anything from those huge autogenerated files and are not using Hydra binary cache will catch those mis-hashes immediately.

I'm running with https://github.com/NixOS/nixpkgs/pull/49862 and nameSourcesPrettily = "full" enabled since 2018 (essentially, implementation details changed over time, but the behavior did not). Do I use a small subset of Nixpkgs? Yes. Do source URLs still fail to fetch sometimes? Yes. But usually things are pretty easy to fix, and older source versions usually can be found on archive servers without changes to any hashes.

But now, remembering the uncountable number of times I had to fix fixed-output hashes locally to make my systems build I'm feeling kind of paranoid. Were all those outputs mis-hashed in honest error or were some of them attempts to poison something?

Some of those mis-hashes persisted for months in master while I was too annoyed with the PR process, thinking them honest errors that would be fixed eventually by people with direct commit access.

Was something poisoned already?

... implementation ...

I think this is more or less what I proposed, just the maps is in the other way around as I was proposing instead drvPath -> outPath & I also propose to let cache.nixos.org/… share this map.

Yes, algorithmically, we are proposing the same thing, but I'm essentially proposing we reuse the existing Nix derivers mechanism for this and while you are talking about how to make Hydra publish its derivers and make Nix use them, I'm pointing out that Hydra publishing its derivers is not enough: if Hydra does not have the output in its cache yet, then the user will have to --fallback build locally, meaning local Nix will have to do the check too, which means checking that the source derivation is in checked derivers set of that output, which would frequently fail, which would mean frequent mass rebuilds, and so I'm proposing a way to minimize those, and outputPathToken is a clever and simple solution, IMHO.

Precise formats of outputPathToken for different fetch*ers and their security implications could and should be discussed.

One possible way to minimize rebuilds […]

What do you call a rebuild here? Is it in the sense that:

the FOD output should be re-checked by Hydra, to complete the map drvPath -> outPath (which indeed should be done by hydra if the fetch* function change, and I guess having a lite version of mkDerivation makes sense if these often change)

I.e., in the terminology I use in my proposal, you are stating that Nix should ask its substitutors for their derivers set for the outPath it tries to substitute and then check that source derivation is in that set. To which I agree.

the FOD output should be re-checked by the end user: this should NOT be the case, at least if hydra already did it before and if we implement the sharing of drvPath -> outPath I was proposing above

Sure, if substitutors are enabled, then Nix should only do "member in a set" check, not a full derivation rebuild (i.e. nix-store -r --check, which is what I call a "rebuild", because nix build --rebuild does it).

the whole world should be rebuilt: this should NOT be the case, once the user is assured that the hash is correct they can use the old derivation as before.

But if the user does not use Hydra, and curl changed, and the user is feeling sufficiently paranoid, I would think having a way to re-check everything would be a good thing.

tobiasBora commented 6 months ago

What reasons?

Anyone relying on cache.nixos.org (which is likely 99% of users) need to trust Hydra, as if it can always inject malicious code inside the final binary… Like Debian users need to trust Debian maintainers. And I don't see how checking the hash can help here, programs are not FOD and their hash may vary across builds in general, so until we have succinct verifiable honest proof of compilation (which, let's be honest, is never going to happen) or at least 100% reproducible builds, we need to trust hydra.

Anyway, the attack I describe would also apply to people NOT using hydra, so building everything from source will not help since it will pick the wrong source.

If the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance. So... almost never.

I think you misunderstand the attack I have in mind. What I describe allows any adversary to inject arbitrarily malicious code (irrespective of the fact of whether the actual program will have a CVE or not) in a future release of any program, assuming they can simply guess the name of the release… which is trivial (these usually simply increase one by one and are often pre-released in an alpha form like 3.42-alpha.

But now, remembering the uncountable number of times I had to fix fixed-output hashes locally

Is it really an experience we want for nix end users?

Was something poisoned already?

Hard to say. I tried to do a quick check in Nixpkgs by looking at duplicated hashes (but it is possible to apply a different attack so that this does not appear that clearly). The number of duplicated hashes is quite large so I used a script to help me to discard the good looking ones, but most of them seem quite honest. Only two physics programs (in pkgs/by-name) were sharing the same hash while having very different names… but I had no time to investigate further.

which would mean frequent mass rebuilds

I don't see why we would get more rebuilds than now. If a program is in the cache, Hydra already downloaded its source, so the user can rely on Hydra's map. If Hydra has not built the package yet… anyway the user needs to rebuild everything. And if the package is a "legacy package" in the sense that it was built by Hydra before we even started to add this option, what I propose (maybe what you propose as well, not sure to understand all details) allows the user to only download & check the source while using the binary compiled by Hydra (which saves most of the time-consuming part).

Precise formats of outputPathToken for different fetch*ers and their security implications could and should be discussed.

I'm not sure to understand all details of your proposition here (maybe because I lack some knowledge of nix's internals) but I think I get the big picture. I think creating equivalent classes of derivations (which, I think, could be a nice concept in general in nix) could be a nice way to limit rebuilds in case of curl updates… but your implementation with outputPathToken raises a number of questions, including in term of security:

how can we enforce that a FOD cannot change their own outputPathToken? If any derivation & fetcher can set this option (as it is the case for now as I understand it), this provides basically no additional security as the attacker could always submit an obfuscated FOD that sets outputPathTokens to the one used in fetchgit/… So for this to work outputPathToken must be added by a new mechanism in nix, and not via the usual derivation.
the outputPathToken should be an indication of the fetcher, but nix should provide command line options to follow or not these recommendations from the fetcher (either in a more or less paranoid way)
if my understanding is correct, outputPathToken alone is not sufficient as it does not assert that the hash is correct. Notably, if I try to execute a derivation, and realize that the provided hash is not correct, one MUST REMOVE this derivation from /nix/store, otherwise we will have no way of knowing if a derivation really evaluates to a given hash.

So based on the first point, since it seems hard to only allow specify fetchers to create a given outputPathToken (unless you know some special tricks?), and since anyway a new nix concept must be introduced, we can maybe deal with this by creating a new kind of derivation that specifies what must be downloaded and not how to obtain it as it is the case for now. For instance, one could create a derivation like:

Derive([
  "kind": "fetchurl",
  "url": "myurl",
  "out": "/nix/store/foo.tar.xz",
  "sha256", "somehash"
])

and nix should automatically understand that when receiving this derivation, it should download the url, using any CURL version they want as soon as the hash matches at the end. If we don't want to hardcode all of them in nix itself (let's try to keep nix small), we can let nix take as input a special nix file trusted-fetchers.nix containing all the "trusted fetchers". This could be a map like kind -> fetcher, such that when nix encounters a custom derivation they are not able to evaluate, nix would resort instead to using the fetcher in the map. This means that we only need to check trusted-fetchers.nix instead of nixpkgs in its totality.

More specifically, fetcher could be a function taking as input the above derivation, and outputting a bit Ok/Not ok (and possibly an error message) and a folder, such that the folder is kept in out by nix iff the bit is true (this bit would be intuitively like "I checked that the hash is correct").

This idea of equivalence of fetchers can certainly be used to also solve other issues with non-deterministic processes. For instance, for now we have no solutions to avoid hash shift with leaveDotGit = true;: https://github.com/NixOS/nixpkgs/issues/8567 and certainly many other package-manager things. One option here would be to create a derivation:

Derive([
  "kind": "fetchgit",
  "url": "git repo",
  "leaveDotGit": true,
  "out": "/nix/store/foo.tar.xz",
  "sha256", "somehash"
])

where the hash only corresponds to the "stable" part of the package, i.e. what is outside .git. The fetcher would then run git clone to clone the package, compute the hash of the "valid" part, and output "OK" if this hash is valid.

But if the user does not use Hydra, and curl changed, and the user is feeling sufficiently paranoid, I would think having a way to re-check everything would be a good thing.

Well, checking the source, I agree it's nice to have… but no need to recompile programs.

oxij commented 6 months ago

Well, checking the source, I agree it's nice to have… but no need to recompile programs.

Most of my mentions of "rebuild" above imply the context of "rebuild the source (<name>.src) derivation". Package derivations (<name>) should not be rebuilt, of course.

What reasons?

Anyone relying on cache.nixos.org (which is likely 99% of users) need to trust Hydra... ... Anyway, the attack I describe would also apply to people NOT using hydra, so building everything from source will not help since it will pick the wrong source.

My point exactly. Some users have personal and/or shared build machines, and /nix/stores on those can be poisoned too. So the solution can not be "check derivers on Hydra and/or other configured binary caches", it has to be a general thing.

If the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance. So... almost never.

I think you misunderstand the attack I have in mind. ...

I'm referencing the exploit described in https://github.com/NixOS/nix/issues/969#issuecomment-2096302974 there.

But now, remembering the uncountable number of times I had to fix fixed-output hashes locally

Is it really an experience we want for nix end users?

Well, those hashes are factually wrong, either in honest error, on in malice. That's just the reality of the situation. If Hydra or OfBorg verified them all, then those packages would have been fixed in master instead of in my local branches.

which would mean frequent mass rebuilds

I don't see why we would get more rebuilds than now.

In the context I discuss there, say curl changes, now all <name>.src derivations that use fetchurl have to be rebuilt.

Precise formats of outputPathToken for different fetch*ers and their security implications could and should be discussed.

... trusted fetchers ...

Hmm, that's a good point. Specifically, yes, outputPathTokens is probably not a good solution when the attacker can also modify the fetch* derivations themselves, so it would at the very least still leave OfBorg vulnerable.

Derive([("out", "/nix/store/foo.tar.xz", "sha256", "somehash")], [
  "kind": "fetchurl",
  "url": "myurl",
])

Making full-featured curl into a Nix builtin is a possibility (as opposed to fetchurlBoot that exists now), but what about derivations fetched with curl and then unpacked with zip (like fetchFromGitHub)? Should zip be a builtin too? What about those fetched with git?

Even if it's just curl and zip, you'd have also make Nix depend on SSL CA's to verify HTTPS certs, otherwise the hashes could be poisoned on your developer machine by MITMing your Nix HTTP traffic, and ca-certs derivation is also non-trivial.

Also, at the moment Nix built with all the optional things disabled is pretty small and can be used for minimal self-bootstrap, requiring all those things will make things harder.

So: Is it possible to do? Yes. Is it a desirable solution? Probably not.

Was something poisoned already?

Hard to say. I tried to do a quick check in Nixpkgs by looking at duplicated hashes (but it is possible to apply a different attack so that this does not appear that clearly). The number of duplicated hashes is quite large so I used a script to help me to discard the good looking ones, but most of them seem quite honest. Only two physics programs (in pkgs/by-name) were sharing the same hash while having very different names… but I had no time to investigate further.

How are you doing this, exactly? Are you parsing Nix files by hand?

nix-env -f ./default.nix -qa --argstr system "x86_64-linux" --attr-path --drv-path --out-path gives the output that has the right data (after all, we need to check that different --attr-paths don't reuse or downgrade --out-paths), but it only evaluates package derivations, not source derivations.

If it did, running that on successive revisions of Nixpkgs, parsing the outputs, building a DB from the results, and then printing --out-path reuses and downgrades would be a good first step. (Expensive to do, though. Especially since I don't have enough memory to evaluate all of Nixpkgs in a single run of nix-env -qa already, and evaluating a subset is kinda useless, given that any of the autogenerated package sets can be poisoned.)

I guess, the simplest way to do this, cheaper than doing nix-env -qaP followed by nix-instantiate -A on all --attr-paths, and then nix-store -q --requisites, followed by parsing the .drv files manually (which would be crazy expensive to do and will also torture your SSD) would be implementing nix-env -q --all option so that nix-env -f ./default.nix -q --all --argstr system "x86_64-linux" --attr-path --drv-path --out-path would print that data for all derivations Nix encounters while doing evaluations, not only the ones --available from the current Nix expression.

My ability to patch Nix sources is fairly superficial ATM and I don't really care for this much, given that https://github.com/NixOS/nixpkgs/pull/49862 with a small custom-name-warning change on top basically solves the problem for me (given that I don't use Hydra). But it would be nice if someone could implement that nix-env -q --all and then possibly somebody else would share all those evaluations on a bunch of revisions of master, so that we could check if Hydra and all users using it were PWNed yet.

nixos-discourse commented 5 months ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixpkgs-supply-chain-security-project/34345/19

tobiasBora commented 5 months ago

I'm not sure why my last answer has not been posted. In the meantime the CVE got published at https://nvd.nist.gov/vuln/detail/CVE-2024-36050

Package derivations (<name>) should not be rebuilt, of course.

Good, just wanted to be sure we are on the same basis.

So the solution can not be "check derivers on Hydra and/or other configured binary caches", it has to be a general thing.

Definitely agree. My point was just to try to reduce the amount of downloaded materials from the user perspective using the cache when available, otherwise users (especially those with little bandwith) would suffer a lot, and if the source is changed they would get 404 error even if it is still cached.

I'm referencing the exploit described in #969 (comment) there.

I show here a much more generic attack https://github.com/leo-colisson/CVE_NixOs_cache_spoil/ with some variants using fetchurl instead of fetchzip etc. For instance, this:

{ pkgs ? import <nixpkgs> {} }:
pkgs.callPackage ({stdenv, fetchzip}:
  let iconpackage = stdenv.mkDerivation rec {
        pname = "iconpackage";
        version = "42.0";
        src = fetchzip {
          url = "http://localhost:8042/iconpackage-${version}.tar.gz"; # <-- this is controlled by the adversary
          sha256 = "sha256-kACAk1+Se9vaJN8FkqLRJsOI7szD9zw015nCxxT54bs=";
        };
        buildPhase = ":";
        installPhase = ''
          mkdir -p $out/share/icons/hicolor/64x64/apps/
          mv myicon.png $out/share/icons/hicolor/64x64/apps/
        '';     
      };
      honestpackage = stdenv.mkDerivation rec {
        pname = "honestpackage";
        version = "1.0";
        src = fetchzip {
          url = "http://localhost:8042/honestpackage-${version}.tar.gz"; # <-- this is NOT controlled by the adversary
          sha256 = "sha256-kACAk1+Se9vaJN8FkqLRJsOI7szD9zw015nCxxT54bs=";
        };
        buildInputs = [ iconpackage ];
        buildPhase = ":";
        installPhase = ''
          mkdir -p $out/bin
          mv honestpackage.sh $out/bin
        '';
      };
  in honestpackage
) {}

Would result in:

$ nix-build
$ ./result/bin/honestpackage
I am malicious >:-|

In the context I discuss there, say curl changes, now all <name>.src derivations that use fetchurl have to be rebuilt.

Yeah I see now, hence my proposition to have trusted fetchers.

Specifically, yes, outputPathTokens is probably not a good solution when the attacker can also modify the fetch* derivations themselve

Well, they can always do that, for instance by implementing their own FOD derivation, without even relying on a pre-existing fetch* at all.

Making full-featured curl into a Nix builtin is a possibility (as opposed to fetchurlBoot that exists now), but what about derivations fetched with curl and then unpacked with zip […] Is it possible to do? Yes. Is it a desirable solution? Probably not.

That's precisely why I don't want to include this in Nix directly, but provide to nix a trusted-fetcher.nix file via a command line like $ nix --trusted-fetchers trusted-fetchers.nix or simply by picking this file automatically by default. One needs to think carefully about this file, but after a quick thought, it could look like:

{pkgs, ...}: {
  "fetchurl" = {
    args = [ "url" ];
    script = "
      ${pkgs.curl}/bin/curl "$url" -O $tmpOut
      ${pkgs.sha256sum}/bin/sha256sum $tmpOut > $tmpOutSha
    ";
  };
}

This way, when a user calls fetchurl { url = "https://foo.com"; hash = "somehash"; } it would create a derivation like:

Derivation({
  "trustedFetcher": "fetchurl",
  "url": "https://foo.com",
  "hash": "somehash"
})

(see that it is agnostic of curl's version), and to execute it, nix would pick the appropriate fetcher in trusted-fetchers.nix (unless it is already present in the cache), run the corresponding script, and compare at the end the content of $tmpOutSha with the hash in the derivation. If they match, the it derives from the hash of the derivation the final $out and copies $tmpOut to this path.

How are you doing this, exactly? Are you parsing Nix files by hand?

Right now this is made in a very dirty way, via a simple grep (install ripgrep):

import subprocess
import os
import re

for line in os.popen("rg -o 'hash = \".*\"' --no-filename | sort | uniq -c | sort -hr | rg -v \"1 hash\" | rg -o '\".*\"'").readlines():
    print(f"==={line}")
    output = os.popen(f"rg -F '{line.strip()}' -C 5").read()
    res = re.findall(r'owner ?= ?"[^"]*"', output)
    if res and len(res) - len(set(res)) > 0:
        print("The owner appears twice, sounds good enough here")
        continue
    res = re.findall(r'url ?= ?"[^"]*"', output)
    if res and len(res) - len(set(res)) > 0:
        print("The url appears twice, sounds good enough here")
        continue
    print(output)
    print("===================================================================\n\n\n")

You can see that the number of duplicated hashes is quite large (155), so I try to discard some of them automatically, for instance if I see the same url twice around the good lines, but this is very dirty, and I don't consider this as any good security measure. Even with this sorting there is quite a lot of entries to manually read, so it would be better to actually extract all such fetch*, and try to run them (without using the local cache of course, not sure how to do) and see if the hash is actually correct…