NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.03k stars 14.04k forks source link

lisp-modules generation broke/flaky? #32344

Closed moredhel closed 3 years ago

moredhel commented 6 years ago

Issue description

Running nix-shell --run 'quicklisp-to-nix .' fails to complete. See the backtrace below.

I have run this on two machines. The second machine has been stuck on: Examining system clack-v1-compat, for the past 5 hours.

Steps to reproduce

git checkout 1f6a0e9b7607fecaf80d5504d3b59da29ebbdeb8
cd pkgs/development/lisp-modules/
nix-shell --run 'quicklisp-to-nix .'

Technical details

Examining system pgloader
WARNING: Unable to use cache for system pgloader.
Subprocess #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {1003AAC383}>
 with command (#P"/nix/store/6yx05zcgfh9qwls4fvg0k60h6l7j52ax-quicklisp-to-nix-system-info-1.0.0/bin/quicklisp-to-nix-system-info"
               "--cacheDir" "/run/user/1000/tmp.6FrUEkO1vA/" "pgloader")
 exited with error code 1
Fatal condition:
Subprocess #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {1003AB9853}>
 with command (#P"/nix/store/6yx05zcgfh9qwls4fvg0k60h6l7j52ax-quicklisp-to-nix-system-info-1.0.0/bin/quicklisp-to-nix-system-info" "pgloader")
 exited with error code 1
Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {1001ED0083}>
0: ((LAMBDA NIL :IN UIOP/IMAGE:PRINT-BACKTRACE))
1: ((FLET "THUNK" :IN UIOP/STREAM:CALL-WITH-SAFE-IO-SYNTAX))
2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (FLET "THUNK" :IN UIOP/STREAM:CALL-WITH-SAFE-IO-SYNTAX) {7FFFF037766B}>)
3: (UIOP/STREAM:CALL-WITH-SAFE-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN UIOP/IMAGE:PRINT-BACKTRACE) {1003AC88EB}> :PACKAGE :CL)
4: (UIOP/IMAGE:PRINT-CONDITION-BACKTRACE #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {1003AC81D3}> :STREAM #<SB-SYS:FD-STREAM for "standard error" {1001ED82F3}> :COUNT NIL)
5: (UIOP/IMAGE:HANDLE-FATAL-CONDITION #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {1003AC81D3}>)
6: (SB-KERNEL::%SIGNAL #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {1003AC81D3}>)
7: (CERROR "IGNORE-ERROR-STATUS" UIOP/RUN-PROGRAM:SUBPROCESS-ERROR :COMMAND (#P"/nix/store/6yx05zcgfh9qwls4fvg0k60h6l7j52ax-quicklisp-to-nix-system-info-1.0.0/bin/quicklisp-to-nix-system-info" "pgloader") :CODE 1 :PROCESS #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {1003AB9853}>)
8: (UIOP/RUN-PROGRAM::%CHECK-RESULT 1 :COMMAND (#P"/nix/store/6yx05zcgfh9qwls4fvg0k60h6l7j52ax-quicklisp-to-nix-system-info-1.0.0/bin/quicklisp-to-nix-system-info" "pgloader") :PROCESS #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {1003AB9853}> :IGNORE-ERROR-STATUS NIL)
9: (UIOP/RUN-PROGRAM::%USE-LAUNCH-PROGRAM (#P"/nix/store/6yx05zcgfh9qwls4fvg0k60h6l7j52ax-quicklisp-to-nix-system-info-1.0.0/bin/quicklisp-to-nix-system-info" "pgloader") :OUTPUT :STRING)
10: (QL-TO-NIX::RAW-SYSTEM-INFO "pgloader")
11: (QL-TO-NIX::SYSTEM-DATA "pgloader")
12: (QL-TO-NIX::SYSTEMS-CLOSURE ("3bmd" "alexandria" "array-utils" "asdf-system-connections" "babel" "blackbird" "bordeaux-threads" "caveman" "cffi" "cffi-grovel" "chipz" "circular-streams" ...))
13: (QL-TO-NIX::QL-TO-NIX ".")
14: ((LABELS QL-TO-NIX::MAKE-GO :IN QL-TO-NIX::MAIN) #P"/run/user/1000/tmp.6FrUEkO1vA/")
15: ((LAMBDA (QL-TO-NIX::*CACHE-DIR*) :IN QL-TO-NIX::MAIN) #P"/run/user/1000/tmp.6FrUEkO1vA/")
16: (QL-TO-NIX-UTIL::CALL-WITH-TEMPORARY-DIRECTORY #<CLOSURE (LAMBDA (QL-TO-NIX::*CACHE-DIR*) :IN QL-TO-NIX::MAIN) {1001EDEDAB}>)
17: ((LAMBDA NIL :IN UIOP/IMAGE:RESTORE-IMAGE))
18: (UIOP/IMAGE:CALL-WITH-FATAL-CONDITION-HANDLER #<CLOSURE (LAMBDA NIL :IN UIOP/IMAGE:RESTORE-IMAGE) {1001EDD18B}>)
19: ((FLET "WITHOUT-INTERRUPTS-BODY-26" :IN SB-EXT:SAVE-LISP-AND-DIE))
20: ((LABELS SB-IMPL::RESTART-LISP :IN SB-EXT:SAVE-LISP-AND-DIE))
Above backtrace due to this condition:
Subprocess #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {1003AB9853}>
 with command (#P"/nix/store/6yx05zcgfh9qwls4fvg0k60h6l7j52ax-quicklisp-to-nix-system-info-1.0.0/bin/quicklisp-to-nix-system-info" "pgloader")
 exited with error code 1
Subprocess #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {1003AB9853}>
 with command (#P"/nix/store/6yx05zcgfh9qwls4fvg0k60h6l7j52ax-quicklisp-to-nix-system-info-1.0.0/bin/quicklisp-to-nix-system-info" "pgloader")
 exited with error code 1

Please run nix-shell -p nix-info --run "nix-info -m" and paste the results.

7c6f434c commented 6 years ago

Hm. I am not even sure I can build quicklisp-to-nix at all after all the updates. I was putting off looking into a few things hoping that a new update of Quicklisp dist would come out first…

Also, @bradleyjensen

moredhel commented 6 years ago

I've currently worked around the problem by installing lispPackages.quicklisp, then doing a (ql:quickload :lucerne)/whatever. This is fine for development, but will be a pain when I want to deploy (via NixOps).

I haven't had the time to delve into the quick-lisp-to-nix function yet, but will do so when I get closer to wanting to deploy. If there are any pointers, that would be super helpful. Especially around the architecture of the solution. I will probably end up adding a high-level description of the process to the README.md too for clarity's sake.

SquircleSpace commented 6 years ago

Most of the quicklisp-to-nix code has documentation, and there’s a readme that gives a high-level intro to the tool. The basic idea is that we install quicklisp in a temp directory and then install the target package in that temp quicklisp. While installing the package, we try to keep track of what dependencies are pulled in.

The important thing to note is that quicklisp-to-nix operates on lisp packages, not quicklisp packages. There are cyclic dependencies in quicklisp packages that disappear when you look only at the lisp package dependencies. So, we can generate multiple nix packages for one quicklisp package.

I should have time to look at the failure later this week.

SquircleSpace commented 6 years ago

Hm. We hit a circular dependency with pgloader. Quicklisp seems to handle it just fine. Odd...

((#<ASDF/LISP-ACTION:LOAD-OP > . #<ASDF/SYSTEM:SYSTEM "simple-date">)
      (#<ASDF/LISP-ACTION:LOAD-OP >
       . #<ASDF/SYSTEM:SYSTEM "simple-date-postgres-glue">)
      (#<ASDF/LISP-ACTION:LOAD-OP >
       . #<ASDF/COMPONENT:MODULE "simple-date-postgres-glue" "simple-date">)
      (#<ASDF/LISP-ACTION:LOAD-OP >
       . #<ASDF/LISP-ACTION:CL-SOURCE-FILE "simple-date-postgres-glue" "simple-date" "cl-postgres-glue">)
      (#<ASDF/LISP-ACTION:PREPARE-OP >
       . #<ASDF/LISP-ACTION:CL-SOURCE-FILE "simple-date-postgres-glue" "simple-date" "cl-postgres-glue">)
      (#<ASDF/LISP-ACTION:PREPARE-OP >
       . #<ASDF/COMPONENT:MODULE "simple-date-postgres-glue" "simple-date">)
      (#<ASDF/LISP-ACTION:PREPARE-OP >
       . #<ASDF/SYSTEM:SYSTEM "simple-date-postgres-glue">))
7c6f434c commented 6 years ago

Not the first time pgloader becomes a problem, though. No idea how exactly it works with Quicklisp.

SquircleSpace commented 6 years ago

Pgloader’s circular dependency is resolved in the latest quicklisp dist. I’m going to just try updating the quicklisp distinfo and reruning quicklisp-to-nix.

SquircleSpace commented 6 years ago

I’m able to generate package descriptions using the latest quicklisp dist. There’s a cyclic dependency between the cl-postgres nix package and the simple-date nix package, though. Shouldn’t be hard to fix with the right overrides.

SquircleSpace commented 6 years ago

I think we need a completely re-thought quicklisp-to-nix workflow. This approach is too complex. I'm investigating workarounds for cases where hacks aren't working. I've become increasingly convinced that lispPackages is a futile venture. We'll always be dealing with edge cases that don't quite work right.

Imagine a nix function quicklispClosureForSystems. It takes in a list of strings (Common Lisp package names that quicklisp is aware of -- that is, things you can quickload). This function returns a derivation which quickloads those packages and then captures the result. Maybe we capture the whole quicklisp directory, maybe we only capture the actual software releases it fetched (and the generated fasls?). Either way, this derivation produces a directory containing all the systems you requested and all of their dependencies. You could then add that directory to your ASDF search path and you would be able to load all the systems you need.

7c6f434c commented 6 years ago

So what you are saying is that we need an attribute set with all the quicklisp sources, an attribute set that gives the transitive dependency source list for each quicklisp quickloadable name, and an attribute set that gives native dependencies for the most popular quicloadable names?

Then it would be feasible to mix with non-Quicklisp things, and even — at my own risk w.r.t. precompiling correct things — to have a more stable part of my profile precompiled while mixing it with the fast-changing one and rebuilding a sane amount.

Maybe ASDF could be convinced to look for FASLs in multiple places, too.

7c6f434c commented 6 years ago

I have just updated lispPackages, and I am actually surprised that it went better than I expected.

7c6f434c commented 6 years ago

Re: circular dependency: on source-tarball-level it is a self-dependency, on ASDF-level it is a normal dependency, but unfortunately it is a circular dependency on file-level. Therefore cl-postgres now comes with simple-date non-removed because to hell with that.

SquircleSpace commented 6 years ago

Alright, I've got the start of a new version of quicklisp to nix. Its not working, but it should be enough to understand the direction I want to take this in.

I generated an attr set that contains every software release in the current quicklisp dists. The derivation it contains aren't very useful. They basically just extract the source of each release and hand it off to buildLispPackage. So, if you know the full set of releases you depend on, you can just depend on those directly. I also generated an attr set that maps lisp system names to the release they come from.

Next, I wrote a small program that makes all the quicklisp releases available to ASDF and loads a system of your choosing. It prints out a nix form describing the system it loaded. This program plays a very similar role to quicklisp-to-nix-determine-dependencies, but it doesn't need to drive quicklisp at all -- all the source is already downloaded and ready to be compiled! The derivations that it outputs depend on the appropriate releases. Dependencies on things outside the lisp ecosystem need to be added in as overrides afterwards (like we already do).

The final step (as of yet unwritten) is to iterate across the existing list of systems and generate new-style lisp packages. I haven't taken that final step because depending on the derivations I produce doesn't result in ASDF being able to locate the systems. I'm having a hard time following how buildLispPackage manages to do that for the currently-existing ql-to-nix packages, so I'm not sure what is wrong with my new derivations.

@7c6f434c, would you mind looking at by branch over at https://github.com/bradleyjensen/nixpkgs/tree/quicklisp-closure ?

SquircleSpace commented 6 years ago

Okay, looks like 4db842f465f06f16a64eb2651cd863b5b44b2b74 broke the thing I'm expecting to work. Prior to that commit, I can put a lisp package in a derivation's buildInputs and then my builder can use ASDF to find the lisp systems provided by the package. After that commit, ASDF barfs about not being able to locate the system.

This is a separate issue, but it is preventing me from testing my new version of quicklisp-to-nix.

SquircleSpace commented 6 years ago

See https://github.com/NixOS/nixpkgs/issues/35306

7c6f434c commented 6 years ago

I am sorry I haven't looked at the branch yet (I want to actually understand the small details and not just verify it is safe-looking, and that requires allocating some effort).

SquircleSpace commented 6 years ago

No worries. Take your time. I'm unblocked, now.

An important thing to note is that my branch doesn't (yet?) generate derivations for pre-compiled lisp code. The simple-date dependency cycle has me convinced that our current solution for pre-compilation is prone to breakage. We can't require that every system in a given asd be compiled. The simple-date dependency cycle is a good example why we can't.

Here are the options I considered.

I'm thinking the second one is much less likely to have unanticipated difficulties.

7c6f434c commented 6 years ago

We can't require that every system in a given asd be compiled. The simple-date dependency cycle

Maybe it is the other way round — sometimes we want to precompile everything in a tarball, not just a single ASD file.

We can change how ASDF searches (they have a list of system-locator functions that you can add to), but I'm hesitant to go down that road.

If we go that way, we would probably need to ask the upstream. I do expect upstream to give some advice and not ignore us, but it may be complicated.

  • Use ASDF's ability to compile a system into a single fasl (compile-bundle-op).

I wouldn't expect automagic to work there…

If you choose not to use sbclWithPackages and instead depend directly on SBCL and the lispPackages, you just won't benefit from pre-compilation.

I wonder if sbclWithPackages could be obtained as a cheap side-effect of some attempts to do a better precompilation.

Also, does ironclad package get shared between two different sbclWithPackages instances?

An approach not among the ones you list: hack output-translation-function: we don't care which copy of the ASDF file gets loaded, as long as the precompiled files are found. Every Lisp package would store its file list in nix-support, these would get loaded at the same time as ASDF search path is initialised; when looking for an output translation we say that for everything inside Nix store the proper answer is also in the Nix store, but maybe in a different store path.

SquircleSpace commented 6 years ago

Gah! The lisp ecosystem is very hostile to nix. We can't even pretend that quicklisp releases are anything like a package from another package manager. Too many packages have side effects (especially filesystem side effects) in their asd files. Just loading the asd (which ASDF needs to do in order to answer any questions at all about the system) can have side effects. Some systems produce source files that other systems in the same asd depend on. In either case, we have a problem because I wanted to treat release source as read-only.

Every time I try to impose nix-like strictness on the lisp ecosystem, I run into problems because packages make use of the flexibility I'm trying to take away from them. The most annoying part is that I don't necessarily think the packages are doing anything wrong (given the ecosystem they live in). Sometimes there just isn't a better option.

I'm going to shelve my branch that tries to revolutionize quicklisp-to-nix. I'm going to pursue a different avenue. I want to stop going against the grain. This means giving up on some of the niceties that nix provides, but I don't feel like battling the lisp ecosystem whenever something changes.

Here's the rough idea. I'll create a nix function that takes a list of lisp system names. It produces a derivation which produces a quicklisp installation where those systems have been quickloaded. This isn't compatible with pure or hydra-like build environments (since quicklisp will be hitting the network), but the results are conceptually pure. The distinfo is fixed, and quicklisp does its own hash checking.

To handle pure build environments, we could create our own quicklisp dist backed by the nix store. Looking around the quicklisp source, I think this is doable. No obvious blockers jumped out at me, but it will be slightly hacky. Maybe I'll send a pr to quicklisp to clean up the slightly gross things we would have to do.

7c6f434c commented 6 years ago

The lisp ecosystem is very hostile to nix.

Well, parts of it are.

Too many packages have side effects (especially filesystem side effects) in their asd files. Just loading the asd (which ASDF needs to do in order to answer any questions at all about the system) can have side effects.

I think that shouldn't require anything else to be installed, though?

Then it is just a postUnpack action.

Some systems produce source files that other systems in the same asd depend on. In either case, we have a problem because I wanted to treat release source as read-only.

Hm. I wonder if the two our favourite problems ever go together. Namely, do circular release-level dependencies ever combine with non-trivial ASDF system behaviour. Although, maybe such systems can just be blacklisted.

This isn't compatible with pure or hydra-like build environments (since quicklisp will be hitting the network), but the results are conceptually pure.

Hm. Is there any benefit over just running Quicklisp in nix-shell?

On the other hand, if we have a list of fetch expressions for Quicklisp releases, and a correspondence from systems to releases needed, we don't even need Quicklisp in build-time: just unpack the needed releases.

To handle pure build environments, we could create our own quicklisp dist backed by the nix store.

Well, I think having the current solution at least for nicely behaved systems would be better than not having it; but how would a separate Quicklisp dist help? I mean, the problems with ASDF systems are the problems with upstream sources, if you want to patch or blacklist, we can just do it while doing things based on the Quicklisp package versions?

stale[bot] commented 4 years ago

Thank you for your contributions.

This has been automatically marked as stale because it has had no activity for 180 days.

If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.

Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse.
  3. Ask on the #nixos channel on irc.freenode.net.