NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.29k stars 1.48k forks source link

Non-ASCII filenames on Darwin lead to different hash #847

Open johbo opened 8 years ago

johbo commented 8 years ago

I get different hashes on Darwin if non-ASCII filenames are included.

This is a way to reproduce the problem:

mkdir reproduce
touch reproduce/décembre
nix-hash reproduce

I see this result on Darwin:

$ nix-hash reproduce
ae4076b53a2de6a8c26c1139d603dde1

And this result on NixOS:

$ nix-hash reproduce
ebf3715949c8cc6c0b03b1320544d17b

My assumption is that this difference was also causing the issue that I got a different hash for Pelican on Darwin than on NixOS. I tracked the difference down to a file called décembre inside of the source tarball of Pelican.

I guess that what we get back as the filename needs special treatment on darwin, so that we get consistent hashing. I am willing to try things out if someone has a hint for me where to start in the codebase.

Pointers:

domenkozar commented 8 years ago

Can you give a way to reproduce? What derivation to build?

johbo commented 8 years ago

Just checked the pelican sources don't seem to have this issue anymore. I'll try to create a small derivation to reproduce the issue.

Ericson2314 commented 8 years ago

Did you edit the OP with the minimal example? If so note there are no notifications from that.

copumpkin commented 8 years ago

@johbo any luck with the repro? There's already a known issue with the default Darwin case-insensitive HFS+ filesystem, since any FO derivation that contains files with different cases will lose the "overlapping" files and then hash to something different.

johbo commented 6 years ago

I got back to it. Here is how I tried to reproduce it, maybe that helps to decide if there is a problem at all inside of Nix or if the issues sits somewhere else.

I've put sources into this repository: https://github.com/johbo/reproduce-nix-unicode-darwin

Basic idea is to use fetchurl to get sources from a repository:

  tarball = pkgs.fetchzip {
    url = https://github.com/johbo/reproduce-nix-unicode-darwin/archive/9c7029ef3b9301c9faf55659ea281332f5f6a281.tar.gz;
    sha256 = "1h7z2wax8ywhp0zr08qm78573rcd6nq3y8scl5pbv3lhpilf44sr";
   };

The repository contains the file décembre which is expected to trigger the issue. That's also a filename from the Pelican repository.

I've built things in the following way both on Darwin and on NixOS:

nix-build -A tarball

Last test was with these versions:

copumpkin commented 6 years ago

One thing I recall from screwing around on Darwin is that HFS+ always stores some normalized form (can't remember the details) of unicode characters, so if you enter your diacritics as combining characters they might get switched to the precomposed forms. Or something like that. We probably just need the hash function to be explicit about what it wants.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

stale[bot] commented 2 years ago

I closed this issue due to inactivity. → More info