Gabriella439 / nix-diff

Explain why two Nix derivations differ
BSD 3-Clause "New" or "Revised" License
357 stars 19 forks source link

Incorrect assumption: Files to diff are not actually encoded according to the current locale #49

Closed sternenseemann closed 2 years ago

sternenseemann commented 2 years ago

Text.IO.readFile uses the system's locale (which usually will be UTF-8):

The functions in this module obey the runtime system's locale, character set encoding, and line ending conversion settings.

Files that will be diffed may be any files imported into the nix store by virtue of being referenced as Nix path in a derivation. One example of this which likely many will run into is the gzipped patch file pkgs/development/libraries/glibc/2.33-master.patch.gz from nixpkgs. Any diff involving any such file will cause nix-diff to fail with an exception.

A simple fix for this would probably be catching the exception while reading leftText and rightText and re-reading them as ByteStrings if one is thrown. Then it'd possible to simply print whether the binary files differ or not.


Much more tricky to fix is the incorrect assumption that derivation files will be encoded according to the current locale: This is not true, in fact derivation are not guaranteed to be properly encoded at all, since Nix strings can contain arbitrary byte sequences. By virtue of nix-derivation assuming a derivation may (as an intermediate form) be representable as Text, nix-diff also has to assume some Unicode-compatible encoding, which won't work in all cases:

$ printf '\x51\xf9\xe2\x31' > bytes
$ nix-instantiate -E 'with import <nixpkgs> {}; stdenv.mkDerivation { name = "bar"; foo = builtins.readFile ./bytes; }'
/nix/store/j95dilq8rnf3vv79vh70ic3blhr3zn6l-bar.drv
$ ghci
GHCi, version 8.10.7: https://www.haskell.org/ghc/  :? for help
Prelude> import qualified Data.Text.Encoding as E
Prelude E> import qualified Data.ByteString as B
Prelude E B> E.decodeUtf8 <$> B.readFile "/nix/store/j95dilq8rnf3vv79vh70ic3blhr3zn6l-bar.drv" 
"*** Exception: Cannot decode byte '\xf9': Data.Text.Internal.Encoding.decodeUtf8: Invalid UTF-8 stream

This is probably not a huge concern for real-world use cases and not necessarily an issue fixable in nix-diff, but something to be aware of.

Profpatsch commented 2 years ago

This is probably not a huge concern for real-world use cases and not necessarily an issue fixable in nix-diff, but something to be aware of.

I wouldn’t say so, as even a simple

builtins.readFile ./foo

will inline the file into the drv, and if it contains any binary data that is not unicode, nix-diff fails currently.

I think the easiest fix here is to use Data.Text.Encoding.decodeUtf8With Data.Text.Encoding.Error.lenientDecode and use the resulting text. This will replace all non-utf8 bytes with the unicode replacement character and we can go on and use the nix-derivation parser on the result.

Profpatsch commented 2 years ago

I’m gonna prepare a PR.