Closed tomberek closed 2 years ago
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/nixos-21-11-zero-hydra-failures/15904/1
Be nice to get @ryantm to run this report,, so we can find out the packages/dependencies that are that are causing the most failures.
https://discourse.nixos.org/t/finding-most-depended-upon-packages-that-fail-to-build-in-hydra/10090
EDIT: I missed it, been a long day in the data mines.
@nixinator, @tomberek already ran this report during a meeting we just had and posted it under his personal domain. Not sure if he posted it up above and you just missed it or if he forgot to post it though.
It is posted above:
Create a Pull Request with the fix targeting master, wait for it to be merged. If your PR causes around 500+ rebuilds, it's preferred to target staging to avoid compute and storage churn.
Hey, there had been some discussion about non-breaking (particularly unbreaking/fixing) commits going directly to staging-next
rather than staging
. Mentioning since any unbreaking changes targeting staging
might miss the branch off of the 19th of November.
Eg, #137501
Can this issue be pinned?
Can we search jobs based on package maintainers? I'd like to quickly check the ones I am maintainer of.
Can we search jobs based on package maintainers? I'd like to quickly check the ones I am maintainer of.
This snippet might be what you are looking for:
nix-shell maintainers/scripts/check-hydra-by-maintainer.nix --argstr maintainer dramaturg
There is also another script to run a build of all your packages:
nix-build maintainers/scripts/build.nix --argstr maintainer dramaturg
nix-shell maintainers/scripts/check-hydra-by-maintainer.nix --argstr maintainer ma27
Also there's a pending PR to directly integrate this into Hydra: https://github.com/NixOS/hydra/pull/743
@dramaturg nix-shell maintainers/scripts/check-hydra-by-maintainer.nix --argstr maintainer dramaturg
is very very nice but does it only check x86_64_linux
or does it only check the host you're on (which in my case is also x86_64_linux
)?
@dramaturg
nix-shell maintainers/scripts/check-hydra-by-maintainer.nix --argstr maintainer dramaturg
is very very nice but does it only checkx86_64_linux
or does it only check the host you're on (which in my case is alsox86_64_linux
)?
True. I just found out you can pass the architecture to hydra-check
- so if you take the nix-shell
command printed by the script and add an argument you can check other archs:
$ nix-shell -p hydra-check --run "hydra-check --arch=aarch64-linux openssl openttd"
Build Status for nixpkgs.openssl.aarch64-linux on unstable
✔ openssl-1.0.2q from 2018-12-22 - https://hydra.nixos.org/build/86211048
Build Status for nixpkgs.openttd.aarch64-linux on unstable
✔ openttd-1.8.0 from 2018-12-22 - https://hydra.nixos.org/build/86224141
(note those quotes) ((or amend the script to include that argument))
nix-build maintainers/scripts/build.nix --argstr maintainer synthetica --argstr system "aarch64-linux"
Seems to work.
I've gone into maintainers/scripts/check-hydra-by-maintainer.nix
and added --arch=aarch64-linux
which works
Although I get a bunch of ⚠ This job is not a member of the latest evaluation of its jobset. This means it was removed or had an evaluation error.
but manually checking https://hydra.nixos.org/eval/1719259?filter=afetch&compare=1719196&full=#tabs-still-succeed mostly looks fine. I guess we can discuss this further in another issue at some point
@tomberek can you add to the "guide" that it's possible to filter jobs by arch by typing the arch in the "search field" in an eval page?
I had some more feedback on the previous ZHF effort here.
For the future, is there a way to sign up for emails when a package I maintain fails to build in hydra?
@newAM there was many years ago, but it was disabled because of a high noise to signal ratio
For the future, is there a way to sign up for emails when a package I maintain fails to build in hydra?
It should be not that hard to program.
Hydra has threads/package statuses It even can be just be scrapped/parsed from HTML form.
And an API, which description information is scattered in documentation
Hydra-API
library in Haskell.
Hydra has a plugin system
It is mainly in Perl. But relatively readable. In the past I know there somewhere was an explicit separate source code of the API provided
Posts the status of the build this thing
& it has Prometheus integration, so Hydra API is compatible with Prometheus API, which can also be (tried) to retrieve info through Prometheus API way.
If web address hook to retrieve status - exists (and form of them found), or made provided & documented - it would be easy to setup for free even IFTTT - it has webhook support & so here you go - automate any action on return status through IFTTT, email, IM or Android notification, whatever one wants.
Let's take a look at the newly failing jobs:
x86_64-linux seems to need ceph work x86_64-darwin needs arrow-cpp aarch64-linux needs .... perhaps a restart on the jobs
I've started a call for darwin maintainers to also help out with ZHF.
Quick note regarding Haskell: If you end up looking at any Haskell packages, please remember to target your PRs against haskell-updates
.
Is there a good way to reproduce the environment Hydra uses for macOS builds? I've been frequently seeing different errors from Hydra & cases where I can build the package locally, but where Hydra fails (eg. python3Packages.pygls
, texlab
, minetest
, ...).
I have sand-boxing enabled, but that's probably not enough?
@kira-bruneau, IME, especially with Python that usually comes down to the Hydra builders using HFS+, which is case-insensitive and uses unicode normal form D (NFD). So anything where unicode strings are compared to file names can be tedious.
... x86_64-darwin needs arrow-cpp
I believe this has been fixed as part of https://github.com/NixOS/nixpkgs/pull/144610.
https://hydra.nixos.org/build/158386594
Seems like this failure is strange, since we could build it locally.
https://hydra.nixos.org/build/158386594
Seems like this failure is strange, since we could build it locally.
is it a out of memory condition on the builder (OOM killer) ...
https://hydra.nixos.org/build/158397595
This seems like a really strange failure: the job is failed, but the build result is available in cache
nix build /nix/store/zc3v69cd7383gw6dciv16farq5mr9nd4-rygel-0.40.2
succeeds.
P.S. @jonringer just restarted the job and it's immediately successful!
@tomberek requesting a new evals report (preferably from a completed eval)
@tomberek requesting a new evals report (preferably from a completed eval)
hydra has been lacking in resources, not many x86_64-linux builds have been completed in the last 24 hours. Situation should improve soon.
For next ZHF, we should probably make a nixos/eval-reports
repo, which has some CI around generating the reports each time a new eval gets started
For next ZHF, we should probably make a
nixos/eval-reports
repo, which has some CI around generating the reports each time a new eval gets started
Don't know how expensive that is, but it might be useful for keeping things green 'round the year (so we can get into some sort of permanent ZHF state).
I created #146494 and #146667 to fix these darwin builds on Hydra, where the tests cannot write to /tmp
:
But now I found that a few days before they were working (the logical conclusion is that /tmp
was writable)
Then, what's the environment on Hydra? I cannot find any information, or I don't know where to search. Is /tmp
writable and there was a configuration error, or it's not supposed to be writable?
@tomberek requesting a new evals report (preferably from a completed eval)
Latest somewhat complete eval: https://shells.darmstadt.ccc.de/~hexa/1723396.html
@tomberek requesting a new evals report (preferably from a completed eval)
Latest somewhat complete eval: https://shells.darmstadt.ccc.de/~hexa/1723396.html
proj, which is listed as the second most complicated dependency, is fixed on darwin now: https://hydra.nixos.org/build/158869394
flexget, osmscout-server and luabind just need a restart, they build fine.
same with soi and osrm-backend (basically all the deps with failed source)
@cab404 Restarted them all, soi on aarch64-linux was successful this time
I think the same might be true for apacheHttpdPackages.php
on darwin x86_64 and aarch64 linux. I can't reproduce the failures we're seeing there.
Happily restarted them, one of them (don't remember which one) finished successfully
Another one for the list: gdc
on darwin. Successful for me on macos 10.15.
Sorry for the late set of PR with darwin fixes, but I wanted to try if they can make it before the release branch is created. I will stop now.
Don't stop we can always backport.
@smancill if possible could you submit some of the fixes to the packages upstream repo?
with the gh cli its fast to fork and clone a repo gh repo fork https://github.com/Nefelim4ag/Ananicy
https://cli.github.com/
@smancill if possible could you submit some of the fixes to the packages upstream repo?
I sent one, I may send a couple more.
Here is the list of packages in my PR that required to be patched in order to work on darwin, if somebody else wants to help:
configure.ac
.Some help with NixOps 1.7 Python 2 dependencies will be very much appreciated: #147034
It doesn't have to be pretty, imho, because it would be reasonable to drop NixOps 1.7 in favor of NixOps 2 (on Python 3) in the next release.
The branch off just occurred a few moments ago. This means that any new fixes that land between now and the actual release should follow the usual backport procedure going forward. New fixes should go to master
and backported to release-21.11
.
The link to the backporting steps in this issue description is broken, because CONTRIBUTING.md
was moved to the top-level directory.
Yep, I think you're right
EDIT: fixed original post :). Thanks @smancill
Some help with NixOps 1.7 Python 2 dependencies will be very much appreciated: #147034
It doesn't have to be pretty, imho, because it would be reasonable to drop NixOps 1.7 in favor of NixOps 2 (on Python 3) in the next release.
The branch off just occurred a few moments ago. This means that any new fixes that land between now and the actual release should follow the usual backport procedure going forward. New fixes should go to
master
and backported torelease-21.11
.
Ouch. I may have backported some PRs to staging-21.11
instead.
Can we remove the backport staging-21.11
label? It is confusing.
Mission
Every time we branch off a release we stabilize the release branch. Our goal here is to get as little as possible jobs failing on the trunk/master jobsets. I'd like to heighten, while it's great to focus on zero as our goal, it's essentially to have all deliverables that worked in the previous release work here also.
Please note the changes included in RFC 85.
Most significantly, branch off will occur on 2021 Nov 19; prior to that date, ZHF will be conducted on master; after that date, ZHF will be conducted on the release channel using a backport workflow similar to previous ZHFs.
Jobsets
nixos:release-21.11 Jobset nixpkgs:nixpkgs-21.11-darwin Jobset
How many failing jobs are there?
At the opening of this issue we have
x86_64-linux
jobset at 653 failing jobsx86_64-darwin
at 1449aarch64-linux
at 782Thanks to nix-review-tools we know which dependencies are causing the most jobs to fail in these individual jobsets:
Previous releases first evals
20.09 had 1153 failing jobs 21.05 had 789 failing jobs
How to help (textual)
Select an evaluation of the trunk jobset
Find a failed job ❌️ , you can use the filter field to scope packages to your platform, or search for packages that are relevant to you. Note: you can filter for architecture by filtering for it, eg: https://hydra.nixos.org/eval/1719540?filter=x86_64-linux&compare=1719463&full=#tabs-still-fail
Search to see if a PR is not already open for the package. It there is one, please help review it.
If there is no open PR, troubleshoot why it's failing and fix it.
Create a Pull Request with the fix targeting master, wait for it to be merged. If your PR causes around 500+ rebuilds, it's preferred to target
staging
to avoid compute and storage churn.(after 2021 Nov 19) Please follow backporting steps and target the
release-21.11
branch if the original PR landed inmaster
orstaging-21.11
if the PR landed instaging
. Be sure to dogit cherry-pick -x <rev>
on the commits that landed in unstable. @jonringer created a video covering the backport process.Always reference this issue in the body of your PR:
Please ping @NixOS/nixos-release-managers on the PR. If you're unable to because you're not a member of the NixOS org please ping @jonringer, @tomberek , @nrdxp
How can I easily check packages that I maintain?
You're able to check failing packages that you maintain by running:
New to nixpkgs?
Packages that don't get fixed
The remaining packages will be marked as broken before the release (on the failing platforms). You can do this like:
Closing
This is a great way to help NixOS, and it is a great time for new contributors to start their nixpkgs adventure. :partying_face:
cc @NixOS/nixpkgs-committers @NixOS/nixpkgs-maintainers @NixOS/release-engineers
Related Issues