Open lukego opened 2 years ago
It's very nice of you to set up this infrastructure for free
Could you add this branch? https://github.com/Uthar/nixpkgs/tree/fixes
I fixed a bunch of packages there, but not created a PR yet
It's very nice of you to set up this infrastructure for free
Thanks. I'm planning to reuse this infrastructure for multiple purposes so starting the Nix/Lisp packages is a good exercise to get it setup while also being useful to the Lisp community.
Could you add this branch? https://github.com/Uthar/nixpkgs/tree/fixes
I have added the job:
https://hydra.nuddy.co/jobset/nixpkgs-lisp/sbclPackages-Uthar-fixes#tabs-evaluations
However you will need to clone the lispnix
repo because I pointed that at your Github url: https://hydra.nuddy.co/jobset/nixpkgs-lisp/sbclPackages-Uthar-fixes#tabs-configuration
This way you can decide what Hydra will test by pushing changes to the sbclPackages.nix
file on your lispnix
branch. Hydra is polling the relevant Git repos every 5 minutes.
I think there is a better way to do all of this, for example using Nix Flakes to define the CI jobs, but one step at a time...
I'm having a lot of errors on my builds at the moment. I'm not sure if it's a setup issue in Hydra. I'll troubleshoot this tomorrow :)
(I'm still troubleshooting :))
enjoy :)
(I'm still troubleshooting :))
Clasp 2.0.0 was released yesterday. It can now save-lisp-and-die into an executable, which starts very fast! I packaged it here as github:uthar/dev#clasp
, maybe it's nice to include it in the hydra builds? I'll add it to the github actions in nix-cl
Great idea re: Clasp!
Sorry I've been distracted lately: traveling to NixCon last week and then doing other work this week. Quick update:
Hydra is running at http://ex43-0.nuddy.co/. I need to troubleshoot why the hydra.nuddy.co hostname isn't working: seems to be an SSL-related issue from when I recently moved the Hydra installation from EC2 to a Hetzner machine.
Hydra has three ways to define jobs and I've now cycled through all of them :)
@Uthar if you want to experiment with Hydra you can clone that jobsets repo ^ and I can have Hydra poll your fork too.
I'd like to have a Hydra job that keeps track of the exact status of every package for every Lisp. That is loosely state = <working,test-error,load-error,build-error,dependency-error>
.
This is a bit tricky because Nix doesn't support first class errors and has a hard time representing e.g. a derivation that failed to build due to broken dependencies. But I was at NixCon last week and I took the chance to discuss this with Eelo Dolstra. He thought it sounded OK to run Nix recursively for this i.e. an inner nix-build
can fail but it is detected and tolerated by the outer nix-build
that Hydra is running.
I have tried to implement that but so far recursive Nix doesn't work for me as it does for others: https://github.com/NixOS/nix/pull/3205#issuecomment-1290041171
I know that an alternative approach is to extend the Lisp code that calls out to nix-build
, i.e. use nix-shell
as the outer Nix layer instead of nix-build
, but when I started doing that it felt a bit like I would end up reimplementing Hydra eventually so I'd like to try harder to keep it all within a Nix build.
I might be on totally the wrong track here though...
Ahh I wish I knew, I would go to the Nix Con too (-:
@Uthar Rumour has it that FOSDEM 2023 will have a Nix devroom (hopefully.) I'm going: maybe I'll meet you there :-) https://fosdem.org/2023/
Where can I learn more about recursive nix build? Maybe it will be useful to implement checkPhase for lisp packages, because their tests exist in separate packages. Or there is something like testInputs
in mkDerivation?
I don't think recursive nix builds are really documented properly but you can see some of the machinery at e.g. https://git.alternativebit.fr/NinjaTrappeur/Nix/commit/c4d7c76b641d82b2696fef73ce0ac160043c18da?style=unified&whitespace=ignore-all
The idea is simple i.e. you call nix-build
from inside a nix build. The implementation is more tricky e.g. making sure the right store paths are transported in/out of the build sandbox. But it seems to have transitioned from "something that basically does not work" into "something that basically does actually work."
Here is the derivation that I have now: https://github.com/nuddyco/lispnix/blob/main/try-build.nix
This attempts to use an inner nix build
to build one Lisp package from inside an outer lisp build
that is making a list of packages that do/don't build. The recursive nix here is basically try-catch: the outer build can succeed (producing log files as output) even if the inner builds fail (because some package is broken or has broken dependencies.)
New QL release 2022-11-07
Would be cool to compare hydra results with last release
Hydra doesn't seem to be very happy at the moment. It's a bit temperamental.
The job queue was getting stuck almost immediately and I suspect it's this issue: https://github.com/NixOS/nix/issues/6981
and I tried working around that by downgrading from master
to release-22.05
but now the queue is running smoothly but seemingly without the recursive nix stuff producing the right results (?).
I suspect that I should be deploying Hydra using Flakes and getting a copy of exactly the flake.lock
file that pins everything to the right versions from somebody in the know... but the admin/setup/troubleshooting side of Hydra is not much fun so I will have to come back after doing some other things!
Here's the latest on my struggles with recursive nix: https://github.com/NixOS/nix/issues/7276. It seems to work except that we don't get all the log output that we need to diagnose build failures.
Maybe I have made peace with recursive nix now: https://github.com/NixOS/nix/issues/7276#issuecomment-1311570416
This is really brain-stretchy stuff so I'll need a short break before I try applying this to building a catalogue of working/broken Lis packages :)
I'd like to start learning hydra, but don;t know where to start, the official documentation I found lacking. Do you recommend some resources? Does it make sense to go straight into declarative flakes, skipping the clicky click stuff?
Honestly my current opinion is that it's better to write and debug all the Nix code locally and then as a last step to migrate it to Hydra.
I will let you know when I have a good example.
Seems that some of my "Hydra problems" are really just recursive-nix problems: https://github.com/NixOS/nix/issues/7297
But for now I still think it is worth being patient and pushing ahead with recursive-nix for building software test reports e.g. keeping track of which builds fail and why. I don't really like the alternatives e.g. tracking it only in Hydra or with homebrew scripts running nix-shell.
I'm "flake-ifying" the Hydra setup today. I know I know I am late to the flakes party.
First I redeployed Hydra using the flake github:nixos/hydra
instead some random version from nixpkgs
. I hope this means that I'm running all the exact same pinned software as the upstream Hydra and that this will help with reliability and troubleshooting.
I also tried defining a Hydra jobset using a flake. It seems to work well and is easy to setup. Just have to point Hydra at a repository containing a flake.nix
that produces a hydraJobs
attribute saying which derivations to build.
Here's an example Hydra evaluation of the sbcl packages on x86_64-linux and aarch64-linux: http://ex43-0.nuddy.co/eval/1429
This is from a fork of this nix-cl
repo with two tweaks:
hydraJobs = packages.sbcl.pkgs;
outputs
to only build Linux-based platforms, because today I wasn't smart enough to see a simple way to filter out the non-working macOS derivations from hydraJobs
otherwise (my brain is still processing flakes in general.)@Uthar I also pointed Hydra at this upstream repo so if you want to try putting some derivations on the hydraJobs
attribute then they should get built automatically over at http://ex43-0.nuddy.co/jobset/nix-cl/nix-cl. The page is showing an error now only because there is no hydraJobs
attribute on the flake.
Hah, better sooner than later
Sounds really good with the Hydra stuff. I'm excited to try it out - there's a lot of work to do, 1000 failing packages, right?
1000 failing packages, right?
Sort: that's combined for x86_64-linux
and aarch64-linux
. So maybe it's 500 packages each failing on both.
I'd like to generate a table / CSV file with columns:
package
arch
success?
missing_library
(e.g. libcrypto.so
based on reading logs)full_logs_path_or_url
I'd imagined building this using recursive nix. That is, the outer-build runs one inner-build for each package and collects the results in a table. (I don't think you can do that with normal nix code because no try-catch mechanism on builds.) However this might be a dead-end because recursive nix has been reliably freezing my nix daemon (https://github.com/NixOS/nix/issues/7297).
So how do you think we should build a report like that? Seems like options include recursive nix (above), downloading results via the hydra api, running tests in nix-shell
instead of hydra to collect logs, or...?
btw it would be interesting to know if you can reproduce the recursive nix problem cited above. Just in case there is something weird about my setup and it's not really a nix bug.
Ah makes sense, yes
I think we could search for BUILD FAILED near the end. Currently the builder catches and prints the build error like that: https://github.com/Uthar/nix-cl/blob/aee754f47672fe4ade6c9da1cbde2a31e3a08e0f/builder.lisp#L4-L12
We could collapse all the newlines in this message to make this easier to get - just read the last line
For the package, arch, success, we can just take from the Nix build, but I'l not sure about link to logs
Progress!
I now have a flake that builds all Lisp packages (currently for sbcl on x86_64-linux), collects the logs (including failures), and makes a table of results (currently raw CSV with limited details.) It's hooked up to Hydra.
This should be a good starting point. Going forward we need to test on more platforms, and extract more columns for the table, and present the results in more interesting ways.
Links:
I had to do quite some kludgery to trap failed builds and convert them into logs. Not the end of the world but I think in the future recursive nix could handle this much better.
Congrats (-: So only 236 failing systems We could test MacOS/Linux, GCC/Clang, JDK versions, ASDF versions, Glibc/Musl - it's quite the matrix Maybe also test if SBCL bootstrapped from each other implementation behaves the same
Other things:
It would be so cool to manipulate the store and derivation in Common Lisp.
I've extended the flake a bit: here's a recent build.
This links two artifacts:
report.csv
now contains 36K rows.report.png
is a first example R/ggplot2 from the data.Currently it's testing this test matrix:
{ lisp = "sbcl"; system = "x86_64-linux"; }
{ lisp = "clasp"; system = "x86_64-linux"; }
{ lisp = "ccl"; system = "x86_64-linux"; }
{ lisp = "abcl"; system = "x86_64-linux"; }
{ lisp = "ecl"; system = "x86_64-linux"; }
{ lisp = "sbcl"; system = "aarch64-linux"; }
#{ lisp = "clasp"; system = "aarch64-linux"; }
#{ lisp = "ccl"; system = "aarch64-linux"; }
{ lisp = "abcl"; system = "aarch64-linux"; }
{ lisp = "ecl"; system = "aarch64-linux"; }
I disabled clasp
and ccl
on aarch64-linux
due to some "unsupported platform" errors but I didn't dig too deep.
Here's the very basic example pic:
Wow! that is super cool. I'll look into packaging Clasp for arm. I think CCL does not run on arm 64 bit. There's also CLISP we should test some day
I exposed the aggregated logs from all the builds now: http://hydra.nuddy.co/build/282896
So now we have ~5M lines of output from the builds to help understand why they don't all work :grin:
I'd like to categorize these errors and indicate them in the CSV data. Then we could e.g. detect when packages fail on different implementations for different reasons and so on. But what should the categories be?
Here's a quick regex hack look at common error messages:
zcat lisp-build-logs.txt.gz | ~/git/nix-cl-report/scanner.awk | sed 's/^OTHER: .*/(other)/' | sort | uniq -c | sort -nr
725 (other)
434 unable-to-load-any-of-the-alternatives
268 component-not-found
223 unable-to-load-foreign-library
157 subprocess
127 unable-to-open
98 filesystem-error-with-pathname
82 load-definition-for-system
70 error-opening
67 cant-create-directory
44 package-cant-be-found
41 variable-is-unbound
40 value-not-of-type
38 permission-denied
30 no-package-named
25 slot-is-unbound
18 package-does-not-exist
13 wrong-number-of-arguments
13 no-applicable-method
12 unrecognized-character-name
11 value-not-expected-type
11 lisp-does-not-support-weak-hash-tables
10 java-exception
10 couldnt-execute
...
The top few lines make me think there is a lot of low-hanging fruit in terms of missing dependencies. Have to think about an efficient way to fix all of those and keep them fixed.
Good news maybe: I looked into why my "jumbo dependency" builds are failing and the main error reason is still foreign libraries:
So it looks like I need to debug the inject-jumbo-dependencies logic to eliminate those errors before we know how serious the remaining errors are.
JFYI: I added a Mac Mini M1 build slave to the Hydra now. It's a bare metal device sitting in my home office unlike the other machines that are all hosted at Hetzner.
This is experimental work-in-progress but I am running a Hydra (Nix CI) instance with builds of the Lisp packages: https://hydra.nuddy.co/
I started last month with one 16-core Ryzen3 CPU running Linux/x86-64 builds.
This week I added an 80-core Ampere ARM64 build machine and just this moment kicked off an experimental larger build of {sbcl,ecl,abcl} on {x86-64, i686, arm64} at https://github.com/NixOS/nixpkgs/pull/193754#issuecomment-1272877277.
The next machine I add will be a Mac M1. Then we should have quite good cross-platform test coverage for Lisp packages.
The tests are very basic right now, and only served up in the raw Hydra webUI, but the intention is to also generate some human-readable reports and e.g. to directly monitor some key projects like SBCL to test changes before they are released.
If anyone wants to collaborate on this, e.g. to have Hydra test some branches of their own for Lisp packages, just leave a comment and I will try to help. The big idea is just to efficiently find and fix problems in Lisp libraries (or their Nix packagings.)