NixOS / nixpkgs-vet

Tool to vet (check) Nixpkgs, including its pkgs/by-name directory
MIT License
29 stars 7 forks source link

Error messages should be better #6

Open infinisil opened 9 months ago

infinisil commented 9 months ago

It should be better explained what people can do when certain CI errors occur. I'm currently manually writing comments like this:

And I shouldn't have to if CI errors were better and ideally linked to more thorough documentation :)

justinas commented 9 months ago

TBH the errors I encountered in this were not hard to comprehend or anything. It was just not straightforward to me how to handle the transition to by-name for packages with multiple versions (pkg_1, pkg_2, etc.), and this was left undecided in RFD 140 itself.

Your comment is super helpful though 🙌

infinisil commented 8 months ago

Just opened https://github.com/NixOS/nixpkgs/pull/292214 to document recommendations for adding new package versions

And there's https://github.com/NixOS/nixpkgs/pull/290743 to improve other error messages

willbush commented 6 months ago

I was a little curious to do some analysis on failed runs of the check-by-name.yml workflow in the nixpkgs repository.

Here's a way to download the logs of the failed runs using Github cli:

run from the nixpkgs/.github/workflows master branch:

gh run list --workflow=check-by-name.yml --status=failure --json databaseId --limit 1000 |
  jq -r '.[] | .databaseId' |
  xargs -P 4 -I{} sh -c '
    echo "Processing run ID {}..."
    gh run view {} --log-failed > "run_{}.log"
  '

Note it's important to be aware of the rate limit of the Github API, you can check it by running:

❯ gh api rate_limit | jq '.rate'
{
  "limit": 5000,
  "used": 1,
  "remaining": 4999,
  "reset": 1713159188
}

My initial testing was with -P 16 and I hit the rate limit quickly.

I changed it to -P 4 and it uses up about 3k requests:

❯ gh api rate_limit | jq '.rate'
{
  "limit": 5000,
  "used": 3013,
  "remaining": 1987,
  "reset": 1713159188
}

Results

I moved the logs to a separate directory and removed the empty logs:

❯ mkdir -p ~/check-by-name-failed
❯ mv run_* ~/check-by-name-failed
❯ cd ~/check-by-name-failed
❯ ls *.log -l | wc -l
1000
❯ find . -name "*.log" -type f -empty -delete
❯ ls *.log -l | wc -l
984

nix eval errors

❯ mkdir -p ~/processed
❯ rg -l "nix-instantiate" | wc -l
319
❯ rg -l "nix-instantiate" | xargs -I {} mv {} ~/processed
❯ rg -l "Nix evaluation failed for some package" | wc -l
38
❯ rg -l "Nix evaluation failed for some package" | xargs -I {} mv {} ~/processed

Seems to be 357 logs that relate to nix eval errors which makes sense to me. Just wondering what GH action caught this before check-by-name?

Actually, I might have screwed up somewhere? After processing more logs I found more nix eval errors. So perhaps it's even more.

new top-level packages

❯ rg -l "top-level package" | wc -l
314
❯ rg -l "top-level package" | xargs -I {} mv {} ~/processed

Github error limit

❯ rg -l "Not retrying anymore, probably GitHub is having internal issues" | wc -l
132
❯ rg -l "Not retrying anymore, probably GitHub is having internal issues" | xargs -I {} mv {} ~/processed

missing package.nix file

❯ rg -l "Missing required \"package.nix\" file" | wc -l
41
❯ rg -l "Missing required \"package.nix\" file" | xargs -I {} mv {} ~/processed

must be defined like

❯ rg -l "must be defined like" | wc -l
44
❯ rg -l "must be defined like" | xargs -I {} mv {} ~/processed

example of multiple similar errors in a single file:

- Because pkgs/by-name/mi/microchip-xc-dsc exists, the attribute `pkgs.microchip-xc-dsc` must be defined like

    microchip-xc-dsc = callPackage ./pkgs/by-name/mi/microchip-xc-dsc/package.nix { /* ... */ };

  However, in this PR, the second argument is empty. See the definition in pkgs/top-level/all-packages.nix:33308:

    microchip-xc-dsc = callPackage ../by-name/mi/microchip-xc-dsc/package.nix { };

  Such a definition is provided automatically and therefore not necessary. Please remove it.

- Because pkgs/by-name/mi/microchip-xc16 exists, the attribute `pkgs.microchip-xc16` must be defined like

    microchip-xc16 = callPackage ./pkgs/by-name/mi/microchip-xc16/package.nix { /* ... */ };

  However, in this PR, the second argument is empty. See the definition in pkgs/top-level/all-packages.nix:33304:

    microchip-xc16 = callPackage ../by-name/mi/microchip-xc16/package.nix { };

  Such a definition is provided automatically and therefore not necessary. Please remove it.

- Because pkgs/by-name/mi/microchip-xc32 exists, the attribute `pkgs.microchip-xc32` must be defined like

    microchip-xc32 = callPackage ./pkgs/by-name/mi/microchip-xc32/package.nix { /* ... */ };

  However, in this PR, the second argument is empty. See the definition in pkgs/top-level/all-packages.nix:33306:

    microchip-xc32 = callPackage ../by-name/mi/microchip-xc32/package.nix { };

  Such a definition is provided automatically and therefore not necessary. Please remove it.

- Because pkgs/by-name/mi/microchip-xc8 exists, the attribute `pkgs.microchip-xc8` must be defined like

    microchip-xc8 = callPackage ./pkgs/by-name/mi/microchip-xc8/package.nix { /* ... */ };

  However, in this PR, the second argument is empty. See the definition in pkgs/top-level/all-packages.nix:33302:

    microchip-xc8 = callPackage ../by-name/mi/microchip-xc8/package.nix { };

  Such a definition is provided automatically and therefore not necessary. Please remove it.

This PR introduces additional instances of discouraged patterns as listed above. Merging is discouraged but would not break the base branch.
To run locally: ./maintainers/scripts/check-by-name.sh master https://github.com/NixOS/nixpkgs.git

rnix errors

❯ rg -l rnix | wc -l
24
❯ rg -l rnix | xargs -I {} mv {} ~/processed

Note: I did not finish processing and categorizing all log files. The above are the largest categories.

philiptaron commented 6 months ago

This is a great data driven approach to improving the errors. Thanks for documenting your process, @willbush!