Open cyounkins opened 2 years ago
Duh, must have been fun to debug :)
I propose validating that the extra 8 bits in SRI SHA256 hashes are unset. If any are set, throw "error: invalid SRI hash".
I’m not sure we can blindly reject them since there might be (legitimate) hashes of that form in the wild. But we could at least print a warning for these. That should be enough to detect the problem without breaking (too badly) existing Nix expressions
I don't think there can be legitimate hashes of that form, since they're just not valid SHA-256 (having more than 256 bits).
I examined nixpkgs by putting SRIs through nix to-base16
and then back through nix to-sri
. In fish:
grep -roPh '(sha256-[^ ]{42,46})(?=[\'"])' * > sha256.txt
for origb64 in (cat sha256.txt); set b16 (nix to-base16 $origb64 | cut -d' ' -f2); set b64 (nix to-sri --type sha256 $b16); if test $origb64 != $b64; echo $origb64; echo $b64; end; end
Full output: https://gist.github.com/cyounkins/cb2ec9307a4d2ef821ac4d7d9f7607f6
Many were missing the padding:
sha256-CIE8vumQPGK+TFAncmpBijANpFALLTadOvkob0gVzro
sha256-CIE8vumQPGK+TFAncmpBijANpFALLTadOvkob0gVzro=
Some had extra padding:
sha256-lePhDRHI++9zs54bTt2/Lu6ZQ7egjJCWb752aI0s7Mw==
sha256-lePhDRHI++9zs54bTt2/Lu6ZQ7egjJCWb752aI0s7Mw=
Two had extra non-padding characters:
sha256-Sec2DSnWYal6wzYzP9W+DDuTKHsFHWdRYyMzliMU5bU=A
sha256-Sec2DSnWYal6wzYzP9W+DDuTKHsFHWdRYyMzliMU5bU=
sha256-BO0OBnszKA6EPTa6dqGrmYfVv2sbbQhg8Lf64uVfCqA=V
sha256-BO0OBnszKA6EPTa6dqGrmYfVv2sbbQhg8Lf64uVfCqA=
Two had trailing bits set:
sha256-MAnMc4KzC551JInrRcfKED4nz04FO0GyyyuDVRmnYTa=
sha256-MAnMc4KzC551JInrRcfKED4nz04FO0GyyyuDVRmnYTY=
sha256-fq7IA5osMKsLx1jTA1iHZ2k972v0myJIWiwAvy4TbLN=
sha256-fq7IA5osMKsLx1jTA1iHZ2k972v0myJIWiwAvy4TbLM=
I can canonicalize these in nixpkgs. I'm afraid someone else will need to modify nix
to be more strict with inputs.
I was confused why there appeared to be only 2 "free" bits when the last encoding block is 16 bits going into a 24 bit block. The answer is in the RFC:
When fewer than 24 input bits are available in an input group, bits with value zero are added (on the right) to form an integral number of 6-bit groups.
This pads the last 16 bits to 18 bits (three 6-bit groups). It is those two bits that can be set or unset.
And all sha256 digests should end in =
:
The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one "=" padding character.
Not very important but if we want to adhere to the standard, SRI has case-INsensitive matching for the algorithm whereas parseHashType
appears to be case-sensitive. https://www.w3.org/TR/CSP2/#source-list-valid-hashes
Because we see hashes that are missing the padding or have extra characters, there is something wrong with the length checking done here: https://github.com/NixOS/nix/blame/646af7325d93f98802b989f8a8e008a25f7a4788/src/libutil/hash.cc#L251
Since the poorly-conforming hashes in nixpkgs found above aren't doing active damage, I'll wait to canonicalize them until we start warning about their use.
slightly related: https://eprint.iacr.org/2022/361.pdf
I’m not sure we can blindly reject them since there might be (legitimate) hashes of that form in the wild. But we could at least print a warning for these. That should be enough to detect the problem without breaking (too badly) existing Nix expressions
FWIW, regular expressions can recognise non-canonical base64 encodings. That might be a low-effort way to search for them “in the wild.”
I agree with @cyounkins that in principle, this makes the security story around nix (and its tooling) more complicated if hashes do not use a canonical representation. In practice, I expect breaking those would be fine:
Let's warn and then after a while considering racheting further. Even if the current bug does not cause vulnerabilities it is very unsavory and will create a bad impression to a security-contious audience.
In the Tvix project, we are encountering more than we would like to, such non-canonical and wrong (in our humble opinion) hashes with excessive padding.
Latest example in date: https://github.com/NixOS/nixpkgs/pull/282191
SRI SHA256 hashes are encoded in base64, which is 6 bits per character in 4-character blocks. 256 / 24 = ~10.7 and 256 - (10 * 24) = 16, so the last block of 24 bits and 4 characters will only be encoding 16 bits. The extra 8 bits are ignored in Nix.
There is no longer a 1-1 correspondence between the SRI SHA256 hash and the actual contents that is being hashed. Here I use the conversion tool but this happens internally in nix:
When upgrading packages, it is sometimes suggested to change the hash of a resource, run a build, and get the correct hash from the resulting error. This is how I discovered this bug - changing the resource hash continued to use the old resource!
I propose validating that the extra 8 bits in SRI SHA256 hashes are unset. If any are set, throw "error: invalid SRI hash".