NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.08k stars 14.13k forks source link

`fetchgit` with `leaveDotGit = true` is still not completely deterministic #8567

Open wizeman opened 9 years ago

wizeman commented 9 years ago

The hash of the cargo git repository returned by fetchgit has changed twice already, even though the git revision didn't change (see de322b48b7006cf72d6adc37a833f2e045950536 and now #8566).

See also: #4752 and #4767

cc @bjornfor and @madjar (due to #4767).

bjornfor commented 9 years ago

How sad. So it may change even with the same git version?

It'd be interesting to know exactly what in .git/ that changes. But if there is non-determinism for single-threaded git repack of the repo, then I don't know what to do.

Btw, here is the script I used to "prove" that git clones could be deterministic (if anyone wants to play around with it):

https://gist.github.com/bjornfor/bb96b09d4bd1a488cd01

It has a copy of the "make_deterministic_repo()" function from nixpkgs and it creates three repos in current dir with the same basename as the script itself.

UPDATE: I mean "make_deterministic_repo" (not ...clone())

madjar commented 9 years ago

I feared the day would happen.

As @bjornfor, it would be great if you could get your hands on two versions of the same derivation with different hashes. This might be hard, because sometimes it depends on the upstream (I first had problems when new commits where pushed to the master branch). Maybe a new branch is added, or something like that?

wizeman commented 9 years ago

Ok, so I tarred up my local copy of the cargo src tree in my /nix/store (which is supposed to have the old sha256 hash), and then used nix-prefetch-git with --leave-dotGit and --fetch-submodules to clone a new tree to see how they differ. I have verified that nix-prefetch-git returned the same new sha256 hash as mentioned in issue #8566.

This is the resulting diff: https://gist.github.com/wizeman/291be066b61e591051af

You can find the tarballs with full contents here: https://dl.dropboxusercontent.com/u/731705/cargo-old.tar.bz2 https://dl.dropboxusercontent.com/u/731705/cargo-new.tar.bz2

wizeman commented 9 years ago

The most obvious differences are inside the fetched submodule in src/rust-installer, specifically:

  1. In the old clone, there is a .git/shallow file (which contains a hash). This file doesn't exist in the new clone.
  2. In the new clone, there is a new .git/refs/remotes directory, which itself contains an empty origin subdirectory. The old clone doesn't have these directories.

It seems that there are also differences in the pack files, but I haven't examined these.

wizeman commented 9 years ago

Regarding point 1, it seems that the previous fetchgit did a shallow clone of the src/rust-installer submodule, because git log only shows one commit (the last one). However, now it seems to be doing a non-shallow clone of that submodule, because I can see the commit history (but only up to the same commit as in the previous fetchgit).

madjar commented 9 years ago

Do you have a way to get the git versions used for both clones? If they differ, a change in the behavior might explain the difference.

jagajaga commented 9 years ago

Do you use --deepClone?

wizeman commented 9 years ago

Neither the cargo derivation nor my nix-prefetch-git command used the deepClone option, they only used leaveDotGit and the option to fetch submodules.

Regarding git, I was using version 2.4.1 when I ran the nix-prefetch-git command. However, I don't know with which git version my pre-existing cargo source was downloaded with. Apparently, the build logs of fetchgit derivations don't mention the git version...

bjornfor commented 9 years ago

I grabbed this diff from two repositories at work, cloned at different times (and probably with different git versions):

https://gist.github.com/bjornfor/5e5ad0a03a52cbc6f70b

andrewrk commented 9 years ago

what's the use case for leaveDotGit = true? It seems one could clone the repository with a normal git clone, then rm -rf the .git folder, and viola, you have a deterministic fetch. What am I missing?

ghost commented 9 years ago

@andrewrk AFAIK, some packages use a build system which requires the .git folder to be present. leaveDotGit clones the repository with the .git folder intact.

aflatter commented 9 years ago

@andrewrk e.g. some ruby gems use git ls-files in their gemspec to define the list of all files in the package.

nbp commented 9 years ago

What is causing such non-determinism? Is that we might be downloading separated commit, then later, as the repository is optimized, we might download packed files?

I think this is doable if we do not clone all the branches/tag, but only the one that we care about, and run

git repack -dnA
git gc --prune=now

Otherwise, we would have to white-list the tags / branches that we are interested in.

nbp commented 9 years ago

Also the solution above was suggested by @bjornfor before.

Regarding point 1, it seems that the previous fetchgit did a shallow clone of the src/rust-installer submodule, because git log only shows one commit (the last one). However, now it seems to be doing a non-shallow clone of that submodule, because I can see the commit history (but only up to the same commit as in the previous fetchgit).

Yes, this is an optimization I made a while ago, because otherwise we were pulling full repositories all the time, which is extremely inefficient. I think we can get rid of this limitation when we are downloading a copy of the history, by creating a branch, and making a local non-shared --depth=1 clone of the submodule and replacing the .git, by the one of the cloned version.

wizeman commented 9 years ago

It seems that leaveDotGit still continues to cause hash mismatches for the cargo package, making it very hard to maintain.

See: de322b48b7006cf72d6adc37a833f2e045950536, #8566 and #8862.

andrewrk commented 9 years ago

Here's some brainstorming:

Modify leaveDotGit to remove certain problematic files that seem to be nondeterministic (as seen by some of those diffs posted above). The specific files that are removed are likely to not be read by the upstream package build process, and if they are, the build process will fail due to ENOENT.

wizeman commented 9 years ago

FWIW, as a workaround I've disabled leaveDotGit for cargo, as it seemed to be causing more problems than it was solving.

anderspapitto commented 9 years ago

@andrewrk another approach would be to "normalize" the git repo by

rm -rf .git
git init
git add .
git commit -am "nix build commit"

which will leave all functionality intact that cares about the actual files (like git ls-files). Build processes that break because they expect certain things in the git history itself should be much rarer.

bendlas commented 9 years ago

I've attempted to make --leave-dotGit more deterministic, by unpacking those packs into the plain, uncompressed object db format: https://github.com/bendlas/nixpkgs/commit/4b9c24a5d33407f88457d7e125ca78cbefa30afa

Unfortunately, this leads to massive growth in git repos (x 20), but since --leave-dotGit should be used mostly for build-time only sources, this might be tolerable, if it buys us a deterministic --leave-dotGit.

edolstra commented 9 years ago

IMHO, we should remove leaveDotGit. That flag is just a mistake. As far a I know, the contents of .git do not have a canonical representation guaranteed to be stable across Git versions, so attempts to make this "deterministic" are doomed to fail. (Note it's not just a matter of determinism, but also of Git not changing. We don't want the hash to change even when the git dependency to fetchgit is upgraded.)

For packages that really need .git, the solution is to just tar the entire tree and put it some place where it can be downloaded using fetchurl.

vcunat commented 9 years ago

It's certainly best to avoid dealing with .git at all, and only a few packages use leaveDotGit.

In any case, packing isn't meant to be fully deterministic. Efficiency and settings of that can differ in different git versions. @bendlas: I don't see why first pack everything and then unpack it.

bendlas commented 9 years ago

@edolstra I'm a big fan of removing leaveDotGit, as soon as all packages that use it are updated. However, I also think that until then, unpacking the objects is a good stop-gap. The basic git object-db format is pretty stable, so determinism would only depend on the gc working consistently.

@vcunat I wasn't sure if git gc would do the same thing without a preceding repack. Also, repacking makes sure that we only deal with pack files, i.e. can mkdir a fresh object dir and unpack into that.

Another thought: Would it be worth it to have a canonical, deterministic representation of versioned repositories in nix? Not git specific, but with a git dir conjurable from such a representation (as well as other formats)?

nbp commented 9 years ago

I guess the main reason of having the leaveDotGit option around is to be able to answer a few git commands. The most common is likely to be git describe, but I would not bet on the fact that this is the only one.

What I can suggests would be to provide a placeholder .git directory in the downloaded sources which serves as a cache of expected commands. Then we provide a wrapped git command which check if this .git folder is a repository or a cache and issue an error if there is no cache hit for the given command.

Also, note that Hydra (used to?) relies on leaveDotGit to do things such as merging top-git branches. So we might be able to remove it from the nix expression, but we will have to keep it in the script.

abbradar commented 9 years ago

I remember this issue with a curse each time I fix a checksum for bundler_HEAD or cabal2nix (although we don't use deepClone for the latter now, yay!) My two cents: I've made a simple wrapper around git which prints every command to stderr.

dezgeg commented 9 years ago

In case we want to go with the git wrapper approach, I opened #9830 which should help with the ruby parts at least.

bjornfor commented 9 years ago

While searching online for "deterministic git", I found this interesting thread (with links): http://permalink.gmane.org/gmane.comp.version-control.git/270502

pmahoney commented 9 years ago

Proposal: somewhat similar to https://github.com/NixOS/nixpkgs/pull/9830 but take it a step further:

Add an attribute to fetchgit, gitCommands, an array of git commandlines, e.g. [ "ls-files -z" ]. Before fetchgit removes .git, it runs all git commands, saving the output in .git-commands/${command}. It also arranges for a fake git command be added to the build PATH so that when package builds run git, they are intercepted, and the cached output is printed out, or a useful error message is given.

Thoughts? I probably won't be able to implement a prototype any time soon, but may give it a shot.

bjornfor commented 9 years ago

I like it!

wizeman commented 9 years ago

I like it too! :+1:

pmahoney commented 9 years ago

This is more or less what I had in mind: https://github.com/NixOS/nixpkgs/pull/10652

domenkozar commented 9 years ago

Maybe https://packages.debian.org/sid/strip-nondeterminism supports making git deterministic

wizeman commented 9 years ago

@domenkozar Last time I looked (last week?), it didn't...

adevress commented 8 years ago

Hi Guys,

we use leaveDotGit quite extensively where I work, mainly due to a big software stack that requires .git for a lot of reason ( versionning in binary, git subproject & git externals ). Even if it has determinism problems, it would be an issue for us if it is removed.

What about a solution of this kind:

This is rather cheating and impure, but the chance of getting a build output change due only to some ".git" modifications is almost null, and it concerns a minor number of packages.

zimbatm commented 8 years ago

It could be fun if nix derivations could accept a different checksumming method. Even if the folder checksum differs I trust git to always give me the same checked-out content. Having to provide both the git ref and the folder checksum feels redundant.

adevress commented 8 years ago

@zimbatm I agree, it would be a better solution but it implies to modify Nix.

zimbatm commented 8 years ago

Actually I didn't think this trough. It would work but not with leaveDotGit since the checksum of the result would still be changing.

domenkozar commented 8 years ago

There is one use case where @edolstra solution doesn't help. Not sure if it's possible to solve it idealistically, but nevertheless I'd like to write it down.

At https://github.com/snabbco/snabb, we run a bunch of tests against different hardware requirements (mostly networking cards). I'd like all commits to be tested on Hydra. If someone does nix-build release.nix at any commit, they'll get test results back (all logs are in $out of the test derivation). They can not run these tests locally, because hardware.

However, since .git changes through time, going back a few commits won't result into the same hash. We use git extensively when generating the manual.

Profpatsch commented 6 years ago

triage: what’s the status of this?

dezgeg commented 6 years ago

Still happened to me this month.

alyssais commented 6 years ago

fwiw, git ls-files doesn't work anyway, because fetchgit removes the index, meaning that from Git's point of view there are no tracked files (even though they all still exist in the working tree).

alyssais commented 6 years ago

To actually build a Ruby gem from Git that used git ls-files, I had to add an intermediary derivation that recreated the index:

bundlerApp {
  pname = "<gem>";
  gemdir = ./.;

  gemConfig.<gem> = oldAttrs: {
    src = stdenv.mkDerivation {
      name = "<gem>-git";
      src = fetchgit {
        leaveDotGit = true;
        rev = "…";
        sha256 = "…";
        url = "…";
      };
      installPhase = ''
        ${git}/bin/git clone . "$out"
      '';
    };
  };

  ruby = ruby;
  exes = [ … ];
}
stale[bot] commented 4 years ago

Thank you for your contributions.

This has been automatically marked as stale because it has had no activity for 180 days.

If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.

Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse.
  3. Ask on the #nixos channel on irc.freenode.net.
wizeman commented 4 years ago

On the current 20.03 branch there seems to be only 4 packages using leaveDotGit (5 in current master).

Perhaps these can also be removed with a bit more effort? Then perhaps we could remove the flag altogether...

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/keep-git-folder-in-when-fetching-a-git-repo/8590/6

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

06kellyjac commented 3 years ago

still an issue

yajo commented 2 years ago

What if one were able to configure the way hash is computed for .git?

For example, to get the sha256 of a repo with leaveDotGit=true:

  1. Compute the hash of the whole directory, excluding .git.
  2. Get the checked-out commit (in git terms, that's what makes a repo "reproducible").
  3. Combine both previous points to get the final hash.

So, as long as the directory contents match point 1, and git repo is checked out at the commit in point 2, we consider the build reproduced.

Obviously it's not 100% pure, but it would be OK for most projects that rely on the availability of the .git folder. A new option could be added to enable this behaviour. Something like fakeDotGitHash=true.

charlesbaynham commented 2 years ago

On the current 20.03 branch there seems to be only 4 packages using leaveDotGit (5 in current master).

Perhaps these can also be removed with a bit more effort? Then perhaps we could remove the flag altogether...

Please don't remove it! There are packages using it other than those in nixpkgs.

LunNova commented 2 years ago

How does this approach look as a workaround? https://github.com/LunNova/nixpkgs/commit/73f2df24eb2dcf6dc41c2416b730f1ccbaddaafc

Atemu commented 2 years ago

The current state of the issue as I see it

Problems/Use-cases

Solutions

Discussion

A custom fake git binary can handle "simple" cases and is easy to set up but might need info to be present at eval time.

Recording + replaying git info is not trivial but could be presented with a simple UI and be very effective.

Making our own git repo might sound like a good idea at first but it falsifies some information (i.e. where HEAD is) and probably won't be deterministic either. A format change due to a git update could easily cause differences. It could be possible to produce a deterministic tree by stripping down a git repo to the simplest form (i.e. unpack all objects) but that still sounds fragile and wasteful space-wise.

Excluding .git from the hash and "trusting" it to be deterministic for any (reasonable) interacton isn't a terrible idea IMO but would need to add complexity and special cases to Nix itself which I don't think we should do.

Closing thoughts

Record+replay can be complex implementation-wise but it's the most promising solution IMO. If we can manage to patch libgit it would solve all known use-cases of leaveDotGit I think. Finding all the git info the build process needs is also trivial: Simply use a wrapper script that records all commands into a file.

Manually implemented fake git commands are also a good solution but since we need record+replay anyways, that would likely be simpler to use and wouldn't have the potential for mismatches.

@pmahoney already implemented a solution to record+replay in https://github.com/NixOS/nixpkgs/pull/10652. It was closed because the PR doesn't address libgit which I think is a bit short-sighted as it does solve all other cases.
I think we should polish up that implementation and use it for cases it can cover. I'd leave fooling libgit to a separate issue.