NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.19k stars 14.19k forks source link

ZERO Hydra Failures 21.11 #144627

Closed tomberek closed 2 years ago

tomberek commented 3 years ago

Mission

Every time we branch off a release we stabilize the release branch. Our goal here is to get as little as possible jobs failing on the trunk/master jobsets. I'd like to heighten, while it's great to focus on zero as our goal, it's essentially to have all deliverables that worked in the previous release work here also.

Please note the changes included in RFC 85.

Most significantly, branch off will occur on 2021 Nov 19; prior to that date, ZHF will be conducted on master; after that date, ZHF will be conducted on the release channel using a backport workflow similar to previous ZHFs.

Jobsets

nixos:release-21.11 Jobset nixpkgs:nixpkgs-21.11-darwin Jobset

How many failing jobs are there?

At the opening of this issue we have

Thanks to nix-review-tools we know which dependencies are causing the most jobs to fail in these individual jobsets:

Previous releases first evals

20.09 had 1153 failing jobs 21.05 had 789 failing jobs

How to help (textual)

  1. Select an evaluation of the trunk jobset Screenshot

  2. Find a failed job ❌️ , you can use the filter field to scope packages to your platform, or search for packages that are relevant to you. Screenshot from 2020-02-08 15 26 47 Note: you can filter for architecture by filtering for it, eg: https://hydra.nixos.org/eval/1719540?filter=x86_64-linux&compare=1719463&full=#tabs-still-fail

  3. Search to see if a PR is not already open for the package. It there is one, please help review it.

  4. If there is no open PR, troubleshoot why it's failing and fix it.

  5. Create a Pull Request with the fix targeting master, wait for it to be merged. If your PR causes around 500+ rebuilds, it's preferred to target staging to avoid compute and storage churn.

  6. (after 2021 Nov 19) Please follow backporting steps and target the release-21.11 branch if the original PR landed in master or staging-21.11 if the PR landed in staging. Be sure to do git cherry-pick -x <rev> on the commits that landed in unstable. @jonringer created a video covering the backport process.

Always reference this issue in the body of your PR:

ZHF: #144627

Please ping @NixOS/nixos-release-managers on the PR. If you're unable to because you're not a member of the NixOS org please ping @jonringer, @tomberek , @nrdxp

How can I easily check packages that I maintain?

You're able to check failing packages that you maintain by running:

# from root of nixpkgs
nix-build maintainers/scripts/build.nix --argstr maintainer <name>

New to nixpkgs?

Packages that don't get fixed

The remaining packages will be marked as broken before the release (on the failing platforms). You can do this like:

meta = {
  # ref to issue/explanation
  # `true` is for everything
  broken = stdenv.isDarwin; 
};

Closing

This is a great way to help NixOS, and it is a great time for new contributors to start their nixpkgs adventure. :partying_face:

cc @NixOS/nixpkgs-committers @NixOS/nixpkgs-maintainers @NixOS/release-engineers

Related Issues

nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-21-11-zero-hydra-failures/15904/1

nixinator commented 3 years ago

Be nice to get @ryantm to run this report,, so we can find out the packages/dependencies that are that are causing the most failures.

https://discourse.nixos.org/t/finding-most-depended-upon-packages-that-fail-to-build-in-hydra/10090

EDIT: I missed it, been a long day in the data mines.

nrdxp commented 3 years ago

@nixinator, @tomberek already ran this report during a meeting we just had and posted it under his personal domain. Not sure if he posted it up above and you just missed it or if he forgot to post it though.

edit

It is posted above: image

fabianhjr commented 3 years ago

Create a Pull Request with the fix targeting master, wait for it to be merged. If your PR causes around 500+ rebuilds, it's preferred to target staging to avoid compute and storage churn.

Hey, there had been some discussion about non-breaking (particularly unbreaking/fixing) commits going directly to staging-next rather than staging. Mentioning since any unbreaking changes targeting staging might miss the branch off of the 19th of November.

Eg, #137501

legendofmiracles commented 3 years ago

Can this issue be pinned?

GuillaumeDesforges commented 3 years ago

Can we search jobs based on package maintainers? I'd like to quickly check the ones I am maintainer of.

dramaturg commented 3 years ago

Can we search jobs based on package maintainers? I'd like to quickly check the ones I am maintainer of.

This snippet might be what you are looking for: nix-shell maintainers/scripts/check-hydra-by-maintainer.nix --argstr maintainer dramaturg

There is also another script to run a build of all your packages: nix-build maintainers/scripts/build.nix --argstr maintainer dramaturg

Ma27 commented 3 years ago

nix-shell maintainers/scripts/check-hydra-by-maintainer.nix --argstr maintainer ma27

Also there's a pending PR to directly integrate this into Hydra: https://github.com/NixOS/hydra/pull/743

06kellyjac commented 3 years ago

@dramaturg nix-shell maintainers/scripts/check-hydra-by-maintainer.nix --argstr maintainer dramaturg is very very nice but does it only check x86_64_linux or does it only check the host you're on (which in my case is also x86_64_linux)?

dramaturg commented 3 years ago

@dramaturg nix-shell maintainers/scripts/check-hydra-by-maintainer.nix --argstr maintainer dramaturg is very very nice but does it only check x86_64_linux or does it only check the host you're on (which in my case is also x86_64_linux)?

True. I just found out you can pass the architecture to hydra-check - so if you take the nix-shell command printed by the script and add an argument you can check other archs:

$ nix-shell -p hydra-check --run "hydra-check --arch=aarch64-linux openssl openttd"
Build Status for nixpkgs.openssl.aarch64-linux on unstable
✔ openssl-1.0.2q from 2018-12-22 - https://hydra.nixos.org/build/86211048
Build Status for nixpkgs.openttd.aarch64-linux on unstable
✔ openttd-1.8.0 from 2018-12-22 - https://hydra.nixos.org/build/86224141

(note those quotes) ((or amend the script to include that argument))

Synthetica9 commented 3 years ago
nix-build maintainers/scripts/build.nix --argstr maintainer synthetica --argstr system "aarch64-linux"

Seems to work.

06kellyjac commented 3 years ago

I've gone into maintainers/scripts/check-hydra-by-maintainer.nix and added --arch=aarch64-linux which works

Although I get a bunch of ⚠ This job is not a member of the latest evaluation of its jobset. This means it was removed or had an evaluation error. but manually checking https://hydra.nixos.org/eval/1719259?filter=afetch&compare=1719196&full=#tabs-still-succeed mostly looks fine. I guess we can discuss this further in another issue at some point

asymmetric commented 3 years ago

@tomberek can you add to the "guide" that it's possible to filter jobs by arch by typing the arch in the "search field" in an eval page?

I had some more feedback on the previous ZHF effort here.

newAM commented 3 years ago

For the future, is there a way to sign up for emails when a package I maintain fails to build in hydra?

jonringer commented 3 years ago

@newAM there was many years ago, but it was disabled because of a high noise to signal ratio

Anton-Latukha commented 3 years ago

For the future, is there a way to sign up for emails when a package I maintain fails to build in hydra?

It should be not that hard to program.

Hydra has threads/package statuses It even can be just be scrapped/parsed from HTML form. And an API, which description information is scattered in documentation Hydra-API library in Haskell. Hydra has a plugin system It is mainly in Perl. But relatively readable. In the past I know there somewhere was an explicit separate source code of the API provided Posts the status of the build this thing

& it has Prometheus integration, so Hydra API is compatible with Prometheus API, which can also be (tried) to retrieve info through Prometheus API way.

If web address hook to retrieve status - exists (and form of them found), or made provided & documented - it would be easy to setup for free even IFTTT - it has webhook support & so here you go - automate any action on return status through IFTTT, email, IM or Android notification, whatever one wants.

tomberek commented 3 years ago

Let's take a look at the newly failing jobs:

x86_64-linux seems to need ceph work x86_64-darwin needs arrow-cpp aarch64-linux needs .... perhaps a restart on the jobs

https://tomberek.info/eval_reports/1719835.html

domenkozar commented 3 years ago

I've started a call for darwin maintainers to also help out with ZHF.

sternenseemann commented 3 years ago

Quick note regarding Haskell: If you end up looking at any Haskell packages, please remember to target your PRs against haskell-updates.

kira-bruneau commented 3 years ago

Is there a good way to reproduce the environment Hydra uses for macOS builds? I've been frequently seeing different errors from Hydra & cases where I can build the package locally, but where Hydra fails (eg. python3Packages.pygls, texlab, minetest, ...).

I have sand-boxing enabled, but that's probably not enough?

toonn commented 3 years ago

@kira-bruneau, IME, especially with Python that usually comes down to the Hydra builders using HFS+, which is case-insensitive and uses unicode normal form D (NFD). So anything where unicode strings are compared to file names can be tedious.

tobim commented 3 years ago

... x86_64-darwin needs arrow-cpp

I believe this has been fixed as part of https://github.com/NixOS/nixpkgs/pull/144610.

balsoft commented 2 years ago

https://hydra.nixos.org/build/158386594

Seems like this failure is strange, since we could build it locally.

#zhfmsk

nixinator commented 2 years ago

https://hydra.nixos.org/build/158386594

Seems like this failure is strange, since we could build it locally.

#zhfmsk

is it a out of memory condition on the builder (OOM killer) ...

balsoft commented 2 years ago

https://hydra.nixos.org/build/158397595

This seems like a really strange failure: the job is failed, but the build result is available in cache

nix build /nix/store/zc3v69cd7383gw6dciv16farq5mr9nd4-rygel-0.40.2

succeeds.

#zhfmsk

P.S. @jonringer just restarted the job and it's immediately successful!

risicle commented 2 years ago

@tomberek requesting a new evals report (preferably from a completed eval)

jonringer commented 2 years ago

@tomberek requesting a new evals report (preferably from a completed eval)

hydra has been lacking in resources, not many x86_64-linux builds have been completed in the last 24 hours. Situation should improve soon.

jonringer commented 2 years ago

For next ZHF, we should probably make a nixos/eval-reports repo, which has some CI around generating the reports each time a new eval gets started

wamserma commented 2 years ago

For next ZHF, we should probably make a nixos/eval-reports repo, which has some CI around generating the reports each time a new eval gets started

Don't know how expensive that is, but it might be useful for keeping things green 'round the year (so we can get into some sort of permanent ZHF state).

smancill commented 2 years ago

I created #146494 and #146667 to fix these darwin builds on Hydra, where the tests cannot write to /tmp:

But now I found that a few days before they were working (the logical conclusion is that /tmp was writable)

Then, what's the environment on Hydra? I cannot find any information, or I don't know where to search. Is /tmp writable and there was a configuration error, or it's not supposed to be writable?

mweinelt commented 2 years ago

@tomberek requesting a new evals report (preferably from a completed eval)

Latest somewhat complete eval: https://shells.darmstadt.ccc.de/~hexa/1723396.html

smancill commented 2 years ago

@tomberek requesting a new evals report (preferably from a completed eval)

Latest somewhat complete eval: https://shells.darmstadt.ccc.de/~hexa/1723396.html

proj, which is listed as the second most complicated dependency, is fixed on darwin now: https://hydra.nixos.org/build/158869394

Artturin commented 2 years ago

https://github.com/NixOS/nixpkgs/issues/146719

tomberek commented 2 years ago

Next eval: https://tomberek.info/eval_reports/1724022.html

cab404 commented 2 years ago

flexget, osmscout-server and luabind just need a restart, they build fine.

cab404 commented 2 years ago

same with soi and osrm-backend (basically all the deps with failed source)

dasJ commented 2 years ago

@cab404 Restarted them all, soi on aarch64-linux was successful this time

risicle commented 2 years ago

I think the same might be true for apacheHttpdPackages.php on darwin x86_64 and aarch64 linux. I can't reproduce the failures we're seeing there.

dasJ commented 2 years ago

Happily restarted them, one of them (don't remember which one) finished successfully

risicle commented 2 years ago

Another one for the list: gdc on darwin. Successful for me on macos 10.15.

smancill commented 2 years ago

Sorry for the late set of PR with darwin fixes, but I wanted to try if they can make it before the release branch is created. I will stop now.

risicle commented 2 years ago

Don't stop we can always backport.

Artturin commented 2 years ago

@smancill if possible could you submit some of the fixes to the packages upstream repo?

with the gh cli its fast to fork and clone a repo gh repo fork https://github.com/Nefelim4ag/Ananicy https://cli.github.com/

smancill commented 2 years ago

@smancill if possible could you submit some of the fixes to the packages upstream repo?

I sent one, I may send a couple more.

Here is the list of packages in my PR that required to be patched in order to work on darwin, if somebody else wants to help:

roberth commented 2 years ago

Some help with NixOps 1.7 Python 2 dependencies will be very much appreciated: #147034

It doesn't have to be pretty, imho, because it would be reasonable to drop NixOps 1.7 in favor of NixOps 2 (on Python 3) in the next release.

tomberek commented 2 years ago

The branch off just occurred a few moments ago. This means that any new fixes that land between now and the actual release should follow the usual backport procedure going forward. New fixes should go to master and backported to release-21.11.

smancill commented 2 years ago

The link to the backporting steps in this issue description is broken, because CONTRIBUTING.md was moved to the top-level directory.

jonringer commented 2 years ago

Yep, I think you're right

EDIT: fixed original post :). Thanks @smancill

mweinelt commented 2 years ago

Some help with NixOps 1.7 Python 2 dependencies will be very much appreciated: #147034

It doesn't have to be pretty, imho, because it would be reasonable to drop NixOps 1.7 in favor of NixOps 2 (on Python 3) in the next release.

147055

thiagokokada commented 2 years ago

The branch off just occurred a few moments ago. This means that any new fixes that land between now and the actual release should follow the usual backport procedure going forward. New fixes should go to master and backported to release-21.11.

Ouch. I may have backported some PRs to staging-21.11 instead.

Can we remove the backport staging-21.11 label? It is confusing.