Open jakubgs opened 3 years ago
I think that's because .git
needs to be included in case of a deepClone
, and so it cannot be guaranteed that it is reproducible. If this is the case, such option should not exist in a Nixpkgs builder and should in my opinion be removed.
Another option would be to remove the .git/objects/info/packs
and .git/objects/pack
files here:
https://github.com/NixOS/nixpkgs/blob/d74573d8ae6f02ef0ac1299c862d1b762ba0aad1/pkgs/build-support/fetchgit/nix-prefetch-git#L244-L247
I marked this as stale due to inactivity. → More info
This is still and issue and not stale.
I've been working on some reproducible builds in Debian which use submodules, and have based that code off of the prefetch routine referenced here. The following script successfully creates a usable and deterministic clone for a repo with submodules of any depth and reference location in my tests - feel free to test and adapt as needed.
#!/usr/bin/env bash
set -euf -o pipefail
#
# Enumerates the various nondeterministic data objects in a git repository that
# should be deleted.
#
declare -ra NONDETERMINISTIC_DATA=(
"logs"
"hooks"
"index"
"FETCH_HEAD"
"ORIG_HEAD"
"config"
"refs/remotes/origin/HEAD"
)
#
# Cleans the submodules in the given directory, removing nondeterministic data.
# @param MODULES_DIR The submodule directory to clean
#
function clean_submodules() {
local -r MODULES_DIR="${1}"
for module_dir in $(find "${MODULES_DIR}" -mindepth 1 -maxdepth 1 -type d)
do
for d in "${NONDETERMINISTIC_DATA[@]}"; do
rm -rf "${module_dir}/${d}"
done
# Clean nested submodules
if [[ -d "${module_dir}/modules" ]]; then
clean_submodules "${module_dir}/modules"
fi
done
}
#
# Cleans the given git repository, removing nondeterministic data and repacking
# its objects.
# @param REPO The local git repository to clean.
#
function clean_repo() {
local -r REPO="${1}"
cd "${REPO}"
# Clean submodules, if any
if [[ -d .git/modules ]]; then
clean_submodules .git/modules
fi
# Remove files that contain timestamps or other nondeterministic properties
for d in "${NONDETERMINISTIC_DATA[@]}"; do
rm -rf ".git/${d}"
done
# Remove remote branches
git branch -r | while read branch; do
git branch -rD "${branch}" >&2
done
# Remove unreachable tags
local -r MAYBE_TAG="$(git tag --points-at HEAD)"
git tag --contains HEAD | while read tag; do
if [[ "${tag}" != "${MAYBE_TAG}", ]]; then
git tag -d "${tag}" >&2
fi
done
# Do a full repack. Must run single-threaded or determinism is lost.
git -c pack.threads=1 repack -A -d -f
rm -f .git/config
# Garbage collect unreferenced objects
git gc --prune=all --keep-largest-pack
}
#
# The main entry point of the program.
# @param REPO The local git repository to clean.
#
function main() {
local -r REPO="${1}"
# Run in subshell to not touch working directory
( clean_repo "${REPO}" )
}
main ${@}
I elected not to delete .git/objects/info/packs
and .git/object/pack
for the sake of backwards compatibility, and it appears to work fine without it.
Any news on it? It would be great to have reproducible submodule fetch. If backward compatibility is an issue, we could add an option "enforceReproducibility", or "removeDotGit" to force it to be reproducible.
@Nihlus when in the build process do you call that script?
@kjeremy Hey, sorry about the long wait - things have been busy here and I kept forgetting to check.
Essentially, I call it right after the clone from a specific tag and branch, and then I pack the results into a deterministic tarball. It boils down to this (assuming UPSTREAM_BUNDLE is a git bundle made from a detached head checkout with submodules updated and initialized):
$(UPSTREAM_TARBALL): $(UPSTREAM_BUNDLE)
git clone ${UPSTREAM_BUNDLE} -b ${PACKAGE_VERSION} --single-branch ${TMP_CLONE}
make-deterministic.sh ${TMP_CLONE}
tar -C ${TMP_CLONE}/../ --sort=name --mtime=@${SOURCE_DATE_EPOCH} --owner=0 --group=0 --numeric-owner --pax-option=exthdr.name=%d/PaxHeader/%f,delete=atime,delete=ctime -cf - ${BUILD_NAME} | gzip -n > ${UPSTREAM_TARBALL}
If anyone is using leaveDotGit
or deepClone
to avoid the Server does not allow request for unadvertised object
error, you can maintain determinism adding a postFetch
that deletes the .git
folder:
src = fetchgit {
...
fetchSubmodules = true;
leaveDotGit = true;
postFetch = ''
rm -rf $out/.git
'';
}
This also breaks flakes which use submodules in one of their inputs
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
Description
When I try to use
pkgs.fetchgit
withfetchSubmodules = true
anddeepClone = true
on a repo that has a submodule that references the same submodule as root repo I get ahash mismatch
error and different hashes for differentnix-build
runs:Reproduction
You can reproduce this behavior by building the derivation in this debug repo: https://github.com/status-im/nix-fetchgit-debug
Additional context
This seems to be caused by differences in Git Packfiles when comparing results of the derivation built multiple times:
The cause of this seems to be the fact that both the root repo and the
nim-waku
submodule reference the samenim-faststreams
repo. Also both this repo andnim-waku
use the samenim-faststreams
commit:5df69fc6
.Metadata
Pings
@Mic92 @bhipple @jtojnar @MarcWeber @bjornfor