NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.05k stars 14.1k forks source link

ZERO Hydra Failures 20.09 #97479

Closed jonringer closed 4 years ago

jonringer commented 4 years ago

UPDATE after initial 20.09 Release: Fixing broken packages will always be possible throughout the lifetime of the 20.09 release. However, you may need to remove the broken = true; attr on the package. Otherwise please follow normal back-port conventions. :)

Old Post: Jobsets:

Mission

Every time we branch off a release we stabilize the release branch. Our goal here is to get as little as possible jobs failing on the release-20.09 jobsets. I'd like to heighten, while it's great to focus on zero as our goal, it's essentially to have all deliverables that worked in the previous release work here also.

How many failing jobs are there?

At the opening of this issue we have the main x86_64-linux jobset at 1153 failing jobs, x86_64-darwin at >7130, and aarch64-linux at 7573+.

Previous releases first evals

19.09 had 1654 failing jobs. 20.03 had 1204 failing jobs, 20.09 had 1153 failing jobs, So we're actually getting better at maintaining a more stable "unstable" channel.

However, our darwin story isn't as good (we need more darwin reviewers/contributors) 20.03 had 1384 failing jobs, 20.09 had >7130 failing jobs,

How to help (textual)

  1. Select an evaluation of the release-20.09 jobset by #id Screenshot from 2020-02-08 15 20 41

  2. Find a failed job ❌️ Screenshot from 2020-02-08 15 26 47

  3. Troubleshoot why it's failing and fix it

  4. Create a Pull Request with the fix targeting master, wait for it to be merged. Generally the job fails on master also, you can verify that on Hydra - example URL: https://hydra.nixos.org/job/nixpkgs/trunk/bash.x86_64-linux. That means most PR's should be target the master branch, however, if your PR causes around 500+ rebuilds, it's preferred to target staging to avoid compute and storage churn.

Always reference this issue in the body of your PR:

ZHF: #97479
  1. After the master PR was merged, select the commit(s) from the master branch and cherry-pick the commit(s) into a backport PR. See CONTRIBUTING.md for exact details on how that should be performed. An example can be found here: Master PR and 20.09 PR

Please ping @NixOS/nixos-release-managers on the PR. If you're unable to because you're not a member of the NixOS org please ping @jonringer and @worldofpeace (the same people in the team).

How to help video tutorial

@jonringer has made a video on YouTube to guide anyone through how fixing something for ZHF will look like: https://www.youtube.com/watch?v=4Zb3GpIc6vk&

New to nixpkgs?

@jonringer created some videos to help get people started with nixpkgs:

Also be sure to check out other resources at: https://github.com/nix-community/awesome-nix

Packages that don't get fixed

The remaining packages will be marked as broken before the release (on the failing platforms). You can do this like:

meta = {
  # ref to issue/explanation
  # `true` is for everything
  broken = stdenv.isDarwin 
};

These are the utility flags used to test the type of platform.

Closing

This is a great way to help NixOS, and it was some of my earliest contributions. Let's go ✌️

✨️ @worldofpeace and @jonringer

cc @NixOS/nixpkgs-committers @NixOS/nixpkgs-maintainers

Related Issues

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-20-09-zero-hydra-failures/8928/1

risicle commented 4 years ago

It can also be useful to refer to https://hydra.nixos.org/eval/1611944, a master evaluation taken around the time of the branchoff, which will have a much longer history you can use to dig back to a package's last successful build and get an indication of the changes which might have caused it to break.

jonringer commented 4 years ago

Also, people can use @samueldr 's nix-review-tools to create a report which will show which packages are causing the most failures.

I usually do something like:

mkdir cache
cd cache
../eval-report $evaluation_id > index.html
xdg-open index.html
Mathnerd314 commented 4 years ago

The nearest NixOS:trunk-combined Hydra evaluation is 1611942 at commit 6152513, 6 commits after the last shared commit 53ce0bf.

vcunat commented 4 years ago

However, our darwin story isn't as good [...]

It's blown up by ghc timing out. That has been occasionally happening and it should go away after some restart(s).

jonringer commented 4 years ago

@vcunat should we pass "big-parallel" for the ghc build so that it has more cores?

fgaz commented 4 years ago

Since last time this was useful to many, here's one way to build all packages which have you as maintainer:

immae commented 4 years ago

We cannot ping @NixOS/nixos-release-managers as required in the issue if we’re not part of the organization, is it possible to fix that?

uri-canva commented 4 years ago

I know there's a more limited number of people who use nix on macOS, so I'd like to focus my efforts there, is there a way of getting a list of jobs that are succeeding on nixpkgs/staging-20.09 and failing on nixpkgs/nixpkgs-20.09-darwin?

Ma27 commented 4 years ago

@uri-canva that is possible by opening an evaluation of nixpkgs/nixpkgs-20.09-darwin and the you have to hit the Compare to button and select nixpkgs/staging-20.09. All "Newly failing" builds are fine one staging-20.09, but fail on nixpkgs-20.09-darwin.

However it should be sufficient to filter for jobs matching x86_64-darwin in Search jobs by name, right?

uri-canva commented 4 years ago

Yes, but that would also show derivations that are failing for the same reason on linux and darwin, and would be fixed by fixing the linux derivations anyway.

makefu commented 4 years ago

Is there any way to get a list of my (currently broken) derivations like in some previous ZHF sprints?

fgaz commented 4 years ago

@makefu you could try to build them with the command i posted in this thread

fgaz commented 4 years ago

As an aside, I wish we could opt-in to failed build notification emails from hydra. I didn't know some of my packages were broken :-/

makefu commented 4 years ago

@makefu you could try to build them with the command i posted in this thread

I actually tried that but the build it looks like the process fails pre-build. I am now in the progress of trying to fix all evaluation errors

~/nixpkgs git reset --hard upstream/master                                  
HEAD is now at 5d131d33268 Merge pull request #97540 from danieldk/fix-clpeak
~/nixpkgs nix-build maintainers/scripts/build.nix --argstr maintainer makefu --show-trace --keep-going
error: while evaluating the attribute 'drvPath' at /home/makefu/nixpkgs/lib/customisation.nix:163:7:
while evaluating the attribute 'buildInputs' of the derivation 'python2.7-aresponses-2.0.0' at /home/makefu/nixpkgs/pkgs/development/interpreters/python/mk-python-derivation.nix:108:5:
while evaluating 'getOutput' at /home/makefu/nixpkgs/lib/attrsets.nix:464:23, called from undefined position:
while evaluating anonymous function at /home/makefu/nixpkgs/pkgs/stdenv/generic/make-derivation.nix:143:17, called from undefined position:
while evaluating 'callPackageWith' at /home/makefu/nixpkgs/lib/customisation.nix:117:35, called from /home/makefu/nixpkgs/pkgs/top-level/python-packages.nix:5446:20:
while evaluating 'makeOverridable' at /home/makefu/nixpkgs/lib/customisation.nix:67:24, called from /home/makefu/nixpkgs/lib/customisation.nix:121:8:
while evaluating anonymous function at /home/makefu/nixpkgs/pkgs/development/python-modules/pytest-asyncio/default.nix:1:1, called from /home/makefu/nixpkgs/lib/customisation.nix:69:16:
while evaluating 'makeOverridablePythonPackage' at /home/makefu/nixpkgs/pkgs/top-level/python-packages.nix:36:37, called from /home/makefu/nixpkgs/pkgs/development/python-modules/pytest-asyncio/default.nix:2:1:
while evaluating 'makeOverridable' at /home/makefu/nixpkgs/lib/customisation.nix:67:24, called from /home/makefu/nixpkgs/pkgs/top-level/python-packages.nix:38:12:
while evaluating anonymous function at /home/makefu/nixpkgs/pkgs/development/interpreters/python/mk-python-derivation.nix:31:1, called from /home/makefu/nixpkgs/lib/customisation.nix:69:16:
pytest-asyncio-0.14.0 not supported for interpreter python2.7

I'll be back once i finished that and maybe encountered some actual build errors.

EDIT: is it preferable to pool all the py3k-related evaluation fixes ( disabled = !isPy3k; ) or should i create a PR for each package?

bbigras commented 4 years ago

Since last time this was useful to many, here's one way to build all packages which have you as maintainer:

* ~apply #97514~ Pull from master so you have the latest build.nix mentioned below

* run `nix-build maintainers/scripts/build.nix --argstr maintainer fgaz` (adjust the maintainer name to yourself)

@fgaz won't this build on master? Don't we want to test the nixos-20.09 branch?

fgaz commented 4 years ago

@bbigras well we're trying to fix both! But yes, I see what you mean, and I guess that pr needs a backport. I'll open another pr. edit: done & merged

jonringer commented 4 years ago

@makefu I got a fix for you #97571

I ran into the same issue with python packages where unsupported interpreters throw an error

domenkozar commented 4 years ago

I've tried fixing springLobby but it fails to find libcurl, did somethng change about that around 15th August?

EDIT: seems like a CMAKE bump

endgame commented 4 years ago

Thank you for clearly describing the contrib process for ZHF. I drive-by fixed a package I happened to use, but wouldn't have bothered contributing to ZHF if it wasn't so easy.

treed commented 4 years ago

I found a few cases in perlPackages builds on Darwin where there's an issue with ld which is fixed by adding "export LD=$CC", which several packages appear to have done already. Some predicate on i686 or Darwin, and some do it unconditionally.

Is it worth opening an issue and trying to find a more general fix for all perlPackages, or should I just do one-off PRs for each one I find like this?

Or maybe do them both so there's at least a fix for 20.09?

jonringer commented 4 years ago

@volth , do you know the best way going forward for @treed

knedlsepp commented 4 years ago
  • nix-build maintainers/scripts/build.nix --argstr maintainer fgaz

Is there a way to get past evaluation errors if there are some? (I get an error: bcrypt-3.2.0 not supported for interpreter python2.7)

loewenheim commented 4 years ago

What is to be done if a package is broken in release-20.09, but works in master? My concrete case is kmime; it was broken by commit ce4eb0b79b3b8e830d40345fb6457fac6ca9a9ec and still is in release-20.09, but is fine in master.

bbigras commented 4 years ago

What is to be done if a package is broken in release-20.09, but works in master? My concrete case is kmime; it was broken by commit ce4eb0b and still is in release-20.09, but is fine in master.

A backport I would guess.

doronbehar commented 4 years ago

I'd use nix show-derivation -f. kmime > kmime.$(git describe HEAD).json when the master and the release branch are checked out, and diff the outputs.

fgaz commented 4 years ago

Is there a way to get past evaluation errors if there are some?

@knedlsepp either #97571 or #97647

ghost commented 4 years ago

What is to be done if a package is broken in release-20.09, but works in master? My concrete case is kmime; it was broken by commit ce4eb0b and still is in release-20.09, but is fine in master.

@loewenheim see comments in #97242 for why kmime is failing

risicle commented 4 years ago

@thoughtpolice in january you said (in https://github.com/NixOS/nixpkgs/pull/77985#pullrequestreview-344972610):

I would also be fine with dropping FoundationDB 5.x builds, too, since they're pretty heavy to build and I imagine most people are running 6.x at this point.

Do you think now might be the time to do that?

risicle commented 4 years ago

pythonPackages.gipc looks like it'll be stuck waiting on https://github.com/jgehrcke/gipc/issues/103

(good thing I tried enabling the tests to discover that)

mvnetbiz commented 4 years ago

I was thinking about going through and disabling all the failing python27Packages that have explicitly dropped python 2 upstream, but I see this https://github.com/NixOS/nixpkgs/pull/92099. Is there any point in bothering? @jonringer

jonringer commented 4 years ago

I was thinking about going through and disabling all the failing python27Packages that have explicitly dropped python 2 upstream, but I see this #92099. Is there any point in bothering? @jonringer

I've been somewhat doing that while reviewing other packages https://github.com/NixOS/nixpkgs/pulls?q=is%3Apr+author%3Ajonringer+disable+is%3Aclosed+python

The discussion you linked to is a bit more involved.

Not recursing into the attr set is the "easiest" solution, but will not perfect for users that still need some python2 packages

tricktron commented 4 years ago
I am a bit late and don't know if this is the right place as I am new to ZHF but I had a look at some darwin packages with the help of nixpkgs-review-tools: Name Count Builds Locally Builds on Hydra Action Needed
python38-curio-1.2 112 :heavy_check_mark: :heavy_check_mark: :white_check_mark: #98927
qtbase-5.14.2 95 :x: :x: Cannot find feature sdk @eqyiel https://github.com/eqyiel/nixpkgs/commit/50c2b5fd030459ff9508f65e9ffdebad0de36a63 has a WIP that removes the usr/bin/xcodebuild impurities. Unfortunately it needs at least the 10.13 sdk. See https://github.com/NixOS/nixpkgs/issues/95199 for more details. @matthewbauer Would now be a good time to upgrade the sdk? Probably that is too much work but is it possible to have multiple apple sdk versions?
qtbase-5.15.0 47 :x: use of undeclared identifier not looked at maybe the 5.14.2 actions fix this too
python3.7-notebook-6.1.3 40 :heavy_check_mark: :heavy_check_mark: :white_check_mark: #98621
python3.8-notebook-6.1.3 38 :heavy_check_mark: :heavy_check_mark: :white_check_mark: #98621
python3.8-fsspec-0.7.4 27 :heavy_check_mark: :x: def test_modified fails I disabled the one failing test in this backport #98987 for review
python3.8-fs-2.4.11 18 :x: :x: resource tracker cannot free resource Seems to be a multiprocessing issue with python 3.8.5 on macosx as it works with python37. I opened an issue here https://github.com/PyFilesystem/pyfilesystem2/issues/430 and disabled the tests in this pr https://github.com/NixOS/nixpkgs/pull/98619 which is open for review too.
risicle commented 4 years ago

@tricktron re curio, I think it would be perfectly acceptable to disable these two tests on darwin given the impact. It's clearly not an issue with the library.

Also, notebook backport has been approved.

risicle commented 4 years ago

I also keep finding someone has got ahead of me discovering upstream issues (frequently @jonringer) so I think it might be useful to link to upstream blocker issues here to save people digging down to the same issue.

(there's also a chance the author seeing the issue referenced might get spurred into action...)

r-burns commented 4 years ago

Samba build is fixed on darwin in 4.13+ by https://gitlab.com/samba-team/devel/samba/-/commit/847208cd8ac68c4c7d1dae63767820db1c69292b which has also been backported in 4.12.6. I also backported the patch to apply cleanly to the current 4.12.5 to avoid rebuilds, but would it be preferable to just update to 4.12.6? (New to ZHF so I don't know what impact that would have)

tricktron commented 4 years ago

@tricktron re curio, I think it would be perfectly acceptable to disable these two tests on darwin given the impact. It's clearly not an issue with the library.

@risicle Ok pr opened here https://github.com/NixOS/nixpkgs/pull/98875

Also, notebook backport has been approved.

Thanks

risicle commented 4 years ago

@r-burns I'd personally go for the bump.

risicle commented 4 years ago

~git-annex-adapter is stuck waiting on https://github.com/alpernebbi/git-annex-adapter/issues/13 and https://github.com/alpernebbi/git-annex-adapter/issues/14~

~(after getting past the immediate trivial build-stopper by passing cacert to checkInputs)~

I can't tell my branches apart.

risicle commented 4 years ago

/me coughs in the direction of #98688

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/is-there-a-feasible-way-to-debug-test-hydra-darwin-failures-quickly/9201/1

risicle commented 4 years ago

PRs still needing review in this thread:

vcunat EDIT: all done now

100049

99673

99655

99642

99586

99500

99383

99252

99076

99052

99040

98580

98381

jonringer commented 4 years ago

yea, sorry, mixture of burnout and ck3 has lead me to neglecting this

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-weekly-08-nixos-weekly/9431/1

domenkozar commented 4 years ago

@jonringer went through all of open ones and merged most of them :)

jonringer commented 4 years ago

Many others also helped, I'll do a proper thanks in the release post :)

bcdarwin commented 4 years ago

The x86_64 build of itk-5.1.1 is broken on Hydra with an "Illegal instruction" error, but building locally succeeds (and Hydra builds master), so I'm not exactly sure how to bisect/troubleshoot this.

KamilaBorowska commented 4 years ago

The x86_64 build of itk-5.1.1 is broken on Hydra with an "Illegal instruction" error, but building locally succeeds (and Hydra builds master), so I'm not exactly sure how to bisect/troubleshoot this.

"Illegal instruction" means a program uses an instruction that is unsupported by a CPU, Hydra builders have different CPUs so certain instructions will be supported only by some machines.

This happens because ITK is built with -march=corei7 (https://github.com/InsightSoftwareConsortium/ITK/blob/4b48c9025f66d179d7b134999e2398f5924093b4/CMake/ITKSetStandardCompilerFlags.cmake#L245). -march=corei7 means that the program is compiled for CPUs supporting MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 only.

I believe we may want to remove -march=corei7 (because it's not correct for NixOS, currently the distribution doesn't assume any particular CPU) and -mtune=native (because reproducibility).

risicle commented 4 years ago

What's the plan for theano? @twhitehead had several suggestions in https://github.com/NixOS/nixpkgs/pull/99516#issuecomment-703289554 but which do we want to take forward for release-20.09?

risicle commented 4 years ago

Hmm... I quite need #99587 merged to be able to backport it to 20.09, but its dependencies are now broken on master.