NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.06k stars 14.1k forks source link

Tracking: deprecate sha256 attribute in fetchers in favor of hash = "<SRI hash>" #325892

Open Aleksanaa opened 4 months ago

Aleksanaa commented 4 months ago

When we did not support SRI hash, we wrote a lot of sha256 = "...", and some of new PRs are still written with this attribute. However, when using an empty string to obtain the correct hash from the error, the SRI hash is obtained, which causes some confusion.

Let's move on from this old attribute. I don't expect to remove it within a certain period of time, but we can throw a warning to prevent this type of writing from continuing to appear in nixpkgs.

I did an experiment last time with cargoHash in https://github.com/NixOS/nixpkgs/pull/323983. I wrote the following script for this:

#!/usr/bin/env bash

process_line() {
    local filename=${1%:}
    if [[ $4 =~ \"(.*)\"\; ]]; then
      local sha256="${BASH_REMATCH[1]}"
    fi

    [[ -z $sha256 ]] && return 0

    local hash=$(nix hash to-sri --type sha256 $sha256)

    echo "Processing: $filename"
    echo "  $sha256 => $hash"

    sed -i "s|sha256 = \"$sha256\"|hash = \"$hash\"|" $filename
}

# split output by line
grep -r 'sha256 = ' . | while IFS= read -r line; do
    # split them further by space
    read -r -a parts <<< "$line"
    process_line "${parts[@]}"
done

We can deprecate each fetcher's sha256 separately, instead of the entire hash, to avoid the burden of review:

I collapsed the check list because it's not feasible to deprecate one by one - [ ] buildBazelPackage #342037 - [ ] fetch-scm - [ ] fetchCrate - [ ] fetchDebianPatch - [ ] fetchDockerConfig - [ ] fetchDockerLayer - [ ] fetchFirefoxAddon - [ ] fetchFrom9Front - [ ] fetchFromBitbucket https://github.com/NixOS/nixpkgs/pull/326028 - [ ] fetchFromGitHub - [ ] fetchFromGitLab - [ ] fetchFromGitea - [ ] fetchFromGithub - [ ] fetchFromGitiles - [ ] fetchFromRepoOrCz - [ ] fetchFromSavannah - [ ] fetchFromSourcehut - [ ] fetchHex - [ ] fetchMavenArtifact - [ ] fetchNextcloudApp - [ ] fetchNpmDeps - [ ] fetchNuGet - [ ] fetchPypi - [ ] fetchPypiLegacy - [ ] fetchRepoProject #342031 - [ ] fetchTarball - [ ] fetchYarnDeps - [ ] fetchbower - [ ] fetchbzr - [ ] fetchcvs - [ ] fetchdarcs - [ ] fetchdocker - [ ] fetchegg - [ ] fetchfossil - [ ] fetchgit - [ ] fetchgitLocal - [ ] fetchgx - [ ] fetchhg - [ ] fetchipfs - [ ] fetchit - [ ] fetchmail - [ ] fetchmail_7 - [ ] fetchmtn - [ ] fetchpatch - [ ] fetchpatch2 - [ ] fetchpijul - [ ] fetchs3 - [ ] fetchsvn - [ ] fetchsvnrevision - [ ] fetchsvnssh - [x] fetchtorrent (Looks like there is no support for sha256 attribute?) - [ ] fetchurl - [ ] fetchutils - [ ] fetchzip
eclairevoyant commented 4 months ago

the SRI hash is obtained, which causes some confusion.

You can still use the SRI sha256 hash in the sha256 attr. (Not disagreeing with the deprecation though.)

emilazy commented 4 months ago

(Taking the opportunity to plug the related lib.fakeHash, which maybe only I use, because it lets you avoid remembering how many As you need.)

Aleksanaa commented 4 months ago

You can still use the SRI sha256 hash in the sha256 attr.

Yes, it's like sha256 = "sha256-...", if after some day we decide to use sha512 by default, it will become sha256 = "sha512-"

eclairevoyant commented 4 months ago

@emilazy I usually just use "" honestly, as Aleksanaa mentioned.

Anyway, discussing strategy: should we remove the usages first for a given fetcher, before we add the warnings?

And your example bash script, it may work safely for something like cargoHash as you did in https://github.com/NixOS/nixpkgs/pull/323983, because we know it will only affect the rust builder/fetcher. But sometimes people use custom builders and scripts (especially update scripts) that may break with this. I will try to come up with a more robust script that only touches one fetcher at a time.

Aleksanaa commented 4 months ago

Anyway, discussing strategy: should we remove the usages first for a given fetcher, before we add the warnings?

Yes, otherwise ofBorg will fail?

I will try to come up with a more robust script that only touches one fetcher at a time.

I'm adding one more grep like grep -rl "fetchFromBitbucket" . | xargs grep -r 'sha256 = ' | while IFS= read -r line; do and it turned out well. But update script is indeed a problem and it is not within the scope of our scan.

emilazy commented 4 months ago

(Thank you, I had no idea "" works! I could swear it didn’t in the past. Sorry for the tangent.)

eclairevoyant commented 4 months ago

Yes, otherwise ofBorg will fail?

I didn't know borgo fails on warnings. But anyway I meant, keeping the 0-rebuild treewide in a separate PR would make it easier to avoid merge conflicts and mixing of logical changes.

I'm adding one more grep like grep -rl "fetchFromBitbucket" . | xargs grep -r 'sha256 = ' | while IFS= read -r line; do

Some packages have multiple fetcher or people doing some custom constructs in the same file, which is why I'll try to avoid grep in this way if possible.

pyrox0 commented 3 months ago

There are many fetchers that are based on others(fetchFromOrCz is based on fetchzip, fetchFrom9Front is based on either fetchurl or fetchgit, etc). Will the sha256 attribute have to be deprecated in those fetchers first, or in the underlying ones?

Aleksanaa commented 3 months ago

So I wrote this script:

my script ```bash #!/usr/bin/env bash EDITED_LOG=./edited-log PROBLEM_LOG=./problem-log SEARCH_CACHE=./search-cache mkfifo nix_repl_in mkfifo nix_repl_out trap "rm -f nix_repl_in nix_repl_out" EXIT # if replacing more files we can remove 2>/dev/null nix repl < nix_repl_in > nix_repl_out 2>/dev/null & NIX_REPL_PID=$! sleep 2 exec 13>nix_repl_in exec 14< nix_repl_out extract_output() { output=$(echo $1 | ansi2txt) if [[ $output == "\"\"" ]]; then return 1 elif [[ $output =~ ^\"([^\"]+)\"$ ]]; then echo "${BASH_REMATCH[1]}" elif [[ $output == "null" ]]; then echo "null" else return 1 fi } send_repl() { local cmd=$1 echo "$cmd" >&13 echo "$cmd" > ./repl_in while IFS= read -r -d $'\n' line <&14; do if [[ -z "$line" ]]; then return 1 else echo "$(extract_output $line)" echo "$line" > ./repl_out fi done } replace_hash() { local attr_name=$1 local hashAlgo=$(send_repl "${attr_name}.src.outputHashAlgo") || return 1 # Only to limit scope here, can be adjusted local hashHomepage=$(send_repl "${attr_name}.src.meta.homepage") || return 1 if [[ "$hashAlgo" == "sha256" ]] && [[ "$hashHomepage" == "https://gitlab.com/"* ]]; then local hash=$(send_repl "${attr_name}.src.outputHash") || return 1 # echo "$attr_name hash" if [[ "$hash" == "sha256"* ]]; then local position=$(send_repl "${attr_name}.meta.position") || return 1 [[ $position == "null" ]] && return 1 local position_file=${position%%:*} grep -q "sha256 = \"$hash\"" $position_file if [ $? -eq 0 ]; then local outPath=$(send_repl "${attr_name}.outPath") || return 1 echo "editing: ${attr_name}:${position_file}" sed -i "s|sha256 = \"$hash\"|hash = \"$hash\"|" $position_file echo "${attr_name}:${position_file}:${outPath}" >> $EDITED_LOG fi fi fi } verify_hash() { local attr_name=$1 local position_file=$2 local outPath=$3 local new_outPath=$(send_repl "${attr_name}.outPath") if [ $? -ne 0 ] || [[ "$outPath" != "$new_outPath" ]]; then echo "problem: ${attr_name}:${position_file}" # echo "'$outPath' != '$new_outPath'" echo "${attr_name}:${position_file}:${outPath}" >> $PROBLEM_LOG fi } [[ -f "$SEARCH_CACHE" ]] || nix search . ^ --json > $SEARCH_CACHE pkg_list=($(cat $SEARCH_CACHE | jq --raw-output 'keys_unsorted | @sh')) send_repl ":l ./." sleep 1 echo "starting to replace" for pkg_raw in ${pkg_list[@]}; do pkg=${pkg_raw#*.*.} pkg=${pkg%\'} replace_hash $pkg done send_repl ":r" sleep 1 echo "starting to examine" while IFS= read -r line; do IFS=':' read -r -a parts <<< "$line" verify_hash ${parts[@]} done < "$EDITED_LOG" ```

It depends on jq and ansi2txt. Currently, because it is single-threaded logic, a treewide operation takes about 10 minutes. Feel free to modify it according to your ideas.

pyrox0 commented 3 months ago

Just noticed in the list, but fetchit is not a fetcher, it's a podman tool. Should probably be removed.

Aleksanaa commented 3 months ago

I tried GitHub (only editing sha256 = "sha256-), and there are 3000+ direct editions, including 1000+ objections (we can modify the check to revert changes directly, using sed or git, depending on the speed). We can still edit 2000+ files painlessly after all. Might be related to https://github.com/NixOS/nixpkgs/issues/301014

Aleksanaa commented 3 months ago

Just noticed in the list, but fetchit is not a fetcher, it's a podman tool. Should probably be removed.

I've given up that list, because we cannot actually determine which fetcher it is (except for those that only serve specific sites). And since they are derivatives of fetchurl or fetchzip, it is not convenient to deprecate hash for a single fetcher

nbraud commented 1 month ago

You can still use the SRI sha256 hash in the sha256 attr.

Yes, it's like sha256 = "sha256-...", if after some day we decide to use sha512 by default, it will become sha256 = "sha512-"

Would that even work?

pkgs.stdenv.mkDerivation {
    name = "hash-algo-mismatch";
    outputHash = "sha512-z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXcg/SpIdNs6c5H0NE8XYXysP+DGNKHfuwvY7kxvUdBeoGlODJ6+SfaPg==";
    outputHashAlgo = "sha256";
}

errors out in builtins.derivationStrict with

error: hash 'sha512-z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXcg/SpIdNs6c5H0NE8XYXysP+DGNKHfuwvY7kxvUdBeoGlODJ6+SfaPg==' should have type 'sha256'

nbraud commented 1 month ago

There are many fetchers that are based on others(fetchFromOrCz is based on fetchzip, fetchFrom9Front is based on either fetchurl or fetchgit, etc). Will the sha256 attribute have to be deprecated in those fetchers first, or in the underlying ones?

I wrote some normalization helpers in #342072 which convert { hash, sha256, sha512, ... } into { outputHash, outputHashAlgo }. In that approach, the underlying fetchers (fetchurl, fetchgit, etc.) need to be updated first.

PS: To be clear, the prefered way for callers to specify the hash would still be the hash attribute, in SRI format; { outputHash, outputHashAlgo } is only used internally to represent the potentially-legacy arguments in a uniform way.

nbraud commented 1 month ago

342072 could probably use another reviewer, now that it's ~finalized, before it's merged and I go wrap almost-all fetchers with it.