NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
11.54k stars 1.45k forks source link

SRI SHA256 hashes are not 1 to 1 with hexadecimal versions #6414

Open cyounkins opened 2 years ago

cyounkins commented 2 years ago

SRI SHA256 hashes are encoded in base64, which is 6 bits per character in 4-character blocks. 256 / 24 = ~10.7 and 256 - (10 * 24) = 16, so the last block of 24 bits and 4 characters will only be encoding 16 bits. The extra 8 bits are ignored in Nix.

There is no longer a 1-1 correspondence between the SRI SHA256 hash and the actual contents that is being hashed. Here I use the conversion tool but this happens internally in nix:

$  nix --version
nix (Nix) 2.3.16
# Unchanged in 2.5.1

$ nix to-sri --type sha256 0000000000000000000000000000000000000000000000000000000000000000
sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=

$ nix to-base16 sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
0000000000000000000000000000000000000000000000000000000000000000

$ nix to-base16 sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB=
0000000000000000000000000000000000000000000000000000000000000000

$ nix to-base16 sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC=
0000000000000000000000000000000000000000000000000000000000000000

$ nix to-base16 sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD=
0000000000000000000000000000000000000000000000000000000000000000

$ nix to-base16 sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAE=
0000000000000000000000000000000000000000000000000000000000000001

When upgrading packages, it is sometimes suggested to change the hash of a resource, run a build, and get the correct hash from the resulting error. This is how I discovered this bug - changing the resource hash continued to use the old resource!

I propose validating that the extra 8 bits in SRI SHA256 hashes are unset. If any are set, throw "error: invalid SRI hash".

thufschmitt commented 2 years ago

Duh, must have been fun to debug :)

I propose validating that the extra 8 bits in SRI SHA256 hashes are unset. If any are set, throw "error: invalid SRI hash".

I’m not sure we can blindly reject them since there might be (legitimate) hashes of that form in the wild. But we could at least print a warning for these. That should be enough to detect the problem without breaking (too badly) existing Nix expressions

edolstra commented 2 years ago

I don't think there can be legitimate hashes of that form, since they're just not valid SHA-256 (having more than 256 bits).

cyounkins commented 2 years ago

I examined nixpkgs by putting SRIs through nix to-base16 and then back through nix to-sri. In fish:

grep -roPh '(sha256-[^ ]{42,46})(?=[\'"])' * > sha256.txt
for origb64 in (cat sha256.txt); set b16 (nix to-base16 $origb64 | cut -d' ' -f2); set b64 (nix to-sri --type sha256 $b16); if test $origb64 != $b64; echo $origb64; echo $b64; end; end

Full output: https://gist.github.com/cyounkins/cb2ec9307a4d2ef821ac4d7d9f7607f6

Many were missing the padding:

sha256-CIE8vumQPGK+TFAncmpBijANpFALLTadOvkob0gVzro
sha256-CIE8vumQPGK+TFAncmpBijANpFALLTadOvkob0gVzro=

Some had extra padding:

sha256-lePhDRHI++9zs54bTt2/Lu6ZQ7egjJCWb752aI0s7Mw==
sha256-lePhDRHI++9zs54bTt2/Lu6ZQ7egjJCWb752aI0s7Mw=

Two had extra non-padding characters:

sha256-Sec2DSnWYal6wzYzP9W+DDuTKHsFHWdRYyMzliMU5bU=A
sha256-Sec2DSnWYal6wzYzP9W+DDuTKHsFHWdRYyMzliMU5bU=
sha256-BO0OBnszKA6EPTa6dqGrmYfVv2sbbQhg8Lf64uVfCqA=V
sha256-BO0OBnszKA6EPTa6dqGrmYfVv2sbbQhg8Lf64uVfCqA=

Two had trailing bits set:

sha256-MAnMc4KzC551JInrRcfKED4nz04FO0GyyyuDVRmnYTa=
sha256-MAnMc4KzC551JInrRcfKED4nz04FO0GyyyuDVRmnYTY=
sha256-fq7IA5osMKsLx1jTA1iHZ2k972v0myJIWiwAvy4TbLN=
sha256-fq7IA5osMKsLx1jTA1iHZ2k972v0myJIWiwAvy4TbLM=

I can canonicalize these in nixpkgs. I'm afraid someone else will need to modify nix to be more strict with inputs.

cyounkins commented 2 years ago

I was confused why there appeared to be only 2 "free" bits when the last encoding block is 16 bits going into a 24 bit block. The answer is in the RFC:

When fewer than 24 input bits are available in an input group, bits with value zero are added (on the right) to form an integral number of 6-bit groups.

This pads the last 16 bits to 18 bits (three 6-bit groups). It is those two bits that can be set or unset.

And all sha256 digests should end in =:

The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one "=" padding character.

Not very important but if we want to adhere to the standard, SRI has case-INsensitive matching for the algorithm whereas parseHashType appears to be case-sensitive. https://www.w3.org/TR/CSP2/#source-list-valid-hashes

Because we see hashes that are missing the padding or have extra characters, there is something wrong with the length checking done here: https://github.com/NixOS/nix/blame/646af7325d93f98802b989f8a8e008a25f7a4788/src/libutil/hash.cc#L251

Since the poorly-conforming hashes in nixpkgs found above aren't doing active damage, I'll wait to canonicalize them until we start warning about their use.

wamserma commented 2 years ago

slightly related: https://eprint.iacr.org/2022/361.pdf

nbraud commented 10 months ago

I’m not sure we can blindly reject them since there might be (legitimate) hashes of that form in the wild. But we could at least print a warning for these. That should be enough to detect the problem without breaking (too badly) existing Nix expressions

FWIW, regular expressions can recognise non-canonical base64 encodings. That might be a low-effort way to search for them “in the wild.”

I agree with @cyounkins that in principle, this makes the security story around nix (and its tooling) more complicated if hashes do not use a canonical representation. In practice, I expect breaking those would be fine:

Ericson2314 commented 10 months ago

Let's warn and then after a while considering racheting further. Even if the current bug does not cause vulnerabilities it is very unsavory and will create a bad impression to a security-contious audience.

RaitoBezarius commented 5 months ago

In the Tvix project, we are encountering more than we would like to, such non-canonical and wrong (in our humble opinion) hashes with excessive padding.

Latest example in date: https://github.com/NixOS/nixpkgs/pull/282191