NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.3k stars 13.54k forks source link

Tracking: deprecate sha256 attribute in fetchers in favor of hash = "<SRI hash>" #325892

Open Aleksanaa opened 2 months ago

Aleksanaa commented 2 months ago

When we did not support SRI hash, we wrote a lot of sha256 = "...", and some of new PRs are still written with this attribute. However, when using an empty string to obtain the correct hash from the error, the SRI hash is obtained, which causes some confusion.

Let's move on from this old attribute. I don't expect to remove it within a certain period of time, but we can throw a warning to prevent this type of writing from continuing to appear in nixpkgs.

I did an experiment last time with cargoHash in https://github.com/NixOS/nixpkgs/pull/323983. I wrote the following script for this:

#!/usr/bin/env bash

process_line() {
    local filename=${1%:}
    if [[ $4 =~ \"(.*)\"\; ]]; then
      local sha256="${BASH_REMATCH[1]}"
    fi

    [[ -z $sha256 ]] && return 0

    local hash=$(nix hash to-sri --type sha256 $sha256)

    echo "Processing: $filename"
    echo "  $sha256 => $hash"

    sed -i "s|sha256 = \"$sha256\"|hash = \"$hash\"|" $filename
}

# split output by line
grep -r 'sha256 = ' . | while IFS= read -r line; do
    # split them further by space
    read -r -a parts <<< "$line"
    process_line "${parts[@]}"
done

We can deprecate each fetcher's sha256 separately, instead of the entire hash, to avoid the burden of review:

I collapsed the check list because it's not feasible to deprecate one by one - [ ] fetch-scm - [ ] fetchCrate - [ ] fetchDebianPatch - [ ] fetchDockerConfig - [ ] fetchDockerLayer - [ ] fetchFirefoxAddon - [ ] fetchFrom9Front - [ ] fetchFromBitbucket https://github.com/NixOS/nixpkgs/pull/326028 - [ ] fetchFromGitHub - [ ] fetchFromGitLab - [ ] fetchFromGitea - [ ] fetchFromGithub - [ ] fetchFromGitiles - [ ] fetchFromRepoOrCz - [ ] fetchFromSavannah - [ ] fetchFromSourcehut - [ ] fetchHex - [ ] fetchMavenArtifact - [ ] fetchNextcloudApp - [ ] fetchNpmDeps - [ ] fetchNuGet - [ ] fetchPypi - [ ] fetchPypiLegacy - [ ] fetchRepoProject - [ ] fetchTarball - [ ] fetchYarnDeps - [ ] fetchbower - [ ] fetchbzr - [ ] fetchcvs - [ ] fetchdarcs - [ ] fetchdocker - [ ] fetchegg - [ ] fetchfossil - [ ] fetchgit - [ ] fetchgitLocal - [ ] fetchgx - [ ] fetchhg - [ ] fetchipfs - [ ] fetchit - [ ] fetchmail - [ ] fetchmail_7 - [ ] fetchmtn - [ ] fetchpatch - [ ] fetchpatch2 - [ ] fetchpijul - [ ] fetchs3 - [ ] fetchsvn - [ ] fetchsvnrevision - [ ] fetchsvnssh - [x] fetchtorrent (Looks like there is no support for sha256 attribute?) - [ ] fetchurl - [ ] fetchutils - [ ] fetchzip
eclairevoyant commented 2 months ago

the SRI hash is obtained, which causes some confusion.

You can still use the SRI sha256 hash in the sha256 attr. (Not disagreeing with the deprecation though.)

emilazy commented 2 months ago

(Taking the opportunity to plug the related lib.fakeHash, which maybe only I use, because it lets you avoid remembering how many As you need.)

Aleksanaa commented 2 months ago

You can still use the SRI sha256 hash in the sha256 attr.

Yes, it's like sha256 = "sha256-...", if after some day we decide to use sha512 by default, it will become sha256 = "sha512-"

eclairevoyant commented 2 months ago

@emilazy I usually just use "" honestly, as Aleksanaa mentioned.

Anyway, discussing strategy: should we remove the usages first for a given fetcher, before we add the warnings?

And your example bash script, it may work safely for something like cargoHash as you did in https://github.com/NixOS/nixpkgs/pull/323983, because we know it will only affect the rust builder/fetcher. But sometimes people use custom builders and scripts (especially update scripts) that may break with this. I will try to come up with a more robust script that only touches one fetcher at a time.

Aleksanaa commented 2 months ago

Anyway, discussing strategy: should we remove the usages first for a given fetcher, before we add the warnings?

Yes, otherwise ofBorg will fail?

I will try to come up with a more robust script that only touches one fetcher at a time.

I'm adding one more grep like grep -rl "fetchFromBitbucket" . | xargs grep -r 'sha256 = ' | while IFS= read -r line; do and it turned out well. But update script is indeed a problem and it is not within the scope of our scan.

emilazy commented 2 months ago

(Thank you, I had no idea "" works! I could swear it didn’t in the past. Sorry for the tangent.)

eclairevoyant commented 2 months ago

Yes, otherwise ofBorg will fail?

I didn't know borgo fails on warnings. But anyway I meant, keeping the 0-rebuild treewide in a separate PR would make it easier to avoid merge conflicts and mixing of logical changes.

I'm adding one more grep like grep -rl "fetchFromBitbucket" . | xargs grep -r 'sha256 = ' | while IFS= read -r line; do

Some packages have multiple fetcher or people doing some custom constructs in the same file, which is why I'll try to avoid grep in this way if possible.

pyrox0 commented 1 month ago

There are many fetchers that are based on others(fetchFromOrCz is based on fetchzip, fetchFrom9Front is based on either fetchurl or fetchgit, etc). Will the sha256 attribute have to be deprecated in those fetchers first, or in the underlying ones?

Aleksanaa commented 1 month ago

So I wrote this script:

my script ```bash #!/usr/bin/env bash EDITED_LOG=./edited-log PROBLEM_LOG=./problem-log SEARCH_CACHE=./search-cache mkfifo nix_repl_in mkfifo nix_repl_out trap "rm -f nix_repl_in nix_repl_out" EXIT # if replacing more files we can remove 2>/dev/null nix repl < nix_repl_in > nix_repl_out 2>/dev/null & NIX_REPL_PID=$! sleep 2 exec 13>nix_repl_in exec 14< nix_repl_out extract_output() { output=$(echo $1 | ansi2txt) if [[ $output == "\"\"" ]]; then return 1 elif [[ $output =~ ^\"([^\"]+)\"$ ]]; then echo "${BASH_REMATCH[1]}" elif [[ $output == "null" ]]; then echo "null" else return 1 fi } send_repl() { local cmd=$1 echo "$cmd" >&13 echo "$cmd" > ./repl_in while IFS= read -r -d $'\n' line <&14; do if [[ -z "$line" ]]; then return 1 else echo "$(extract_output $line)" echo "$line" > ./repl_out fi done } replace_hash() { local attr_name=$1 local hashAlgo=$(send_repl "${attr_name}.src.outputHashAlgo") || return 1 # Only to limit scope here, can be adjusted local hashHomepage=$(send_repl "${attr_name}.src.meta.homepage") || return 1 if [[ "$hashAlgo" == "sha256" ]] && [[ "$hashHomepage" == "https://gitlab.com/"* ]]; then local hash=$(send_repl "${attr_name}.src.outputHash") || return 1 # echo "$attr_name hash" if [[ "$hash" == "sha256"* ]]; then local position=$(send_repl "${attr_name}.meta.position") || return 1 [[ $position == "null" ]] && return 1 local position_file=${position%%:*} grep -q "sha256 = \"$hash\"" $position_file if [ $? -eq 0 ]; then local outPath=$(send_repl "${attr_name}.outPath") || return 1 echo "editing: ${attr_name}:${position_file}" sed -i "s|sha256 = \"$hash\"|hash = \"$hash\"|" $position_file echo "${attr_name}:${position_file}:${outPath}" >> $EDITED_LOG fi fi fi } verify_hash() { local attr_name=$1 local position_file=$2 local outPath=$3 local new_outPath=$(send_repl "${attr_name}.outPath") if [ $? -ne 0 ] || [[ "$outPath" != "$new_outPath" ]]; then echo "problem: ${attr_name}:${position_file}" # echo "'$outPath' != '$new_outPath'" echo "${attr_name}:${position_file}:${outPath}" >> $PROBLEM_LOG fi } [[ -f "$SEARCH_CACHE" ]] || nix search . ^ --json > $SEARCH_CACHE pkg_list=($(cat $SEARCH_CACHE | jq --raw-output 'keys_unsorted | @sh')) send_repl ":l ./." sleep 1 echo "starting to replace" for pkg_raw in ${pkg_list[@]}; do pkg=${pkg_raw#*.*.} pkg=${pkg%\'} replace_hash $pkg done send_repl ":r" sleep 1 echo "starting to examine" while IFS= read -r line; do IFS=':' read -r -a parts <<< "$line" verify_hash ${parts[@]} done < "$EDITED_LOG" ```

It depends on jq and ansi2txt. Currently, because it is single-threaded logic, a treewide operation takes about 10 minutes. Feel free to modify it according to your ideas.

pyrox0 commented 1 month ago

Just noticed in the list, but fetchit is not a fetcher, it's a podman tool. Should probably be removed.

Aleksanaa commented 1 month ago

I tried GitHub (only editing sha256 = "sha256-), and there are 3000+ direct editions, including 1000+ objections (we can modify the check to revert changes directly, using sed or git, depending on the speed). We can still edit 2000+ files painlessly after all. Might be related to https://github.com/NixOS/nixpkgs/issues/301014

Aleksanaa commented 1 month ago

Just noticed in the list, but fetchit is not a fetcher, it's a podman tool. Should probably be removed.

I've given up that list, because we cannot actually determine which fetcher it is (except for those that only serve specific sites). And since they are derivatives of fetchurl or fetchzip, it is not convenient to deprecate hash for a single fetcher