Detect native dependencies of Lisp packages

lukego commented 1 year ago

How could we automate discovery of the libraries and programs from nixpkgs that each Lisp system depends on?

I suspect this will become more important over time if we want to support custom Lisp distributions (#25) because then we can't entirely rely on hand-crafted dependency declarations based on the latest Quicklisp e.g. on kons-9 they are forking a bunch of dependencies and requiring many new foreign libraries on the forks.

I don't know what the best approach is but here is a first idea that I'm toying around with. This doesn't involve screen-scraping the failing build logs (yet...)

Algorithm to discover dependencies of a Lisp package:

Builds with no libraries?
  yes ->
    Done: no dependencies.
  no ->
    Builds with "jumbo" set of all potential anticipated dependencies?
      yes ->
        Progress: we are able to build the package!
        Recommended next step:
          Iteratively search for minimal subset of packages that are truly needed.
      no ->
        Broken package: human needs to check for reason e.g.
          - completely broken
          - unsupported on the lisp/arch
          - depends on unanticipated package from nixpkgs (must add to "jumbo" set)
          - depends on something outside nixpkgs (must bundle somehow)

I'd like to test this idea by writing down a "jumbo" set of all the packages that Lisp systems might depend on, based on browsing build logs, and then see what effect this has e.g. does it make a lot more systems work.

If it does have a big effect then we could think about how to "shrink" the dependencies for each package so they don't all depend on everything. (Or if it doesn't improve the package situation much maybe we needn't bother.)

Here's the first tiny snippet of libraries that I've seen referenced in failing build logs:

ncurses, renderdoc, geoip, xorg.libX11, libcerf, libblas, libdrm, libfann, libfarmhash, glpk, gsl, libiio, tokyocabinet, leveldb, lmdb.

If the total ends up being less than ~50 that might be reasonable to maintain as our "jumbo" collection.

Uthar commented 1 year ago

🤔

Uthar commented 1 year ago

I think this will increase the closure size too much to include additional native libraries, so I'm not gonna try this myself - I thought of parsing the last log line for "libfoo" and trying to add pkgs.foo to nativeLibs.

Uthar commented 1 year ago

Ah, it won't increase because we'll filter it out. My brain is fried today

lukego commented 1 year ago

I'll let you know how far I get with this idea.

Maybe the "jumbo" package definition will be the same independent of whether we start with everything and remove, or start with nothing and add incrementally? In both cases there is a universe of possible sets of packages and we are making some kind of heuristic navigation of it e.g. taking hints from log files.

Tricky to find the right balance between automation verses manual maintenance. I am reminded of hassles I had installing MGL-MAT (https://github.com/melisgl/mgl/issues/8) because there are two BLAS implementations that both work for compilation but one of them is unstable and causes segfaults at runtime.

Aside: I also wonder if A Plan for Spam would work for heuristic search e.g. for a dumb analyzer to discover that tokens like libcrypto.so mean that openssl is missing. But I think there are many simpler ideas to eliminate first :grin:

lukego commented 1 year ago

Just based on one pass manually through the build logs I came up with this set of ~50 candidate foreign library dependencies:

alsa-lib
fcgi
flac
freeimage
geoip
geos
glpk
graphviz
gsl
leveldb
libGL
libblas
libcerf
libdrm
libev
libevent
libfann
libfarmhash
libgcrypt
libiio
libinput
liblinear
libnet
libpcap
libsvm
libtcod
libxkbcommon
libzstd
lmdb
lzlib
mesa
mpg123
ncurses
nlopt
nvidia-x11 # or opengl-drivers
openal
openslp
openssl
pixman
plplot
portaudio
renderdoc
rocm-opencl-runtime
rrdtool
sane-backends
secp256k1
snappy
termbox
tesseract3
tokyocabinet
unixODBC
xorg.libX11
zyre

Hopefully there will be fewer external programs needed...

lukego commented 1 year ago

Here is another passing idea:

To discover the native dependencies of a Lisp package we could build & run it with the full "jumbo" dependency set and somehow trace which libraries and programs it actually accesses from the nix store. Then we could use those as its dependencies and drop the rest.

How could we do the tracing? I think there are multiple options, like strace or bpftrace or a logging FUSE filesystem, and the question is which one is simplest and works most reliably in a nix sandbox.

There will likely be some packages that have dependencies that don't show up so easily e.g. because they are only accessed at runtime during real usage. Maybe for those we would want to write a small fragment of test code to exercise the relevant code paths to trigger loading the dependencies.

Uthar commented 1 year ago

How could we do the tracing? I think there are multiple options, like strace or bpftrace or a logging FUSE filesystem, and the question is which one is simplest and works most reliably in a nix sandbox.

Idea: Hook open() syscalls using LD_PRELOAD - so we don't have to string parse the output of strace.

lukego commented 1 year ago

I'm thinking that "Step 0" is to simply compare how many packages build successfully with basic dependencies verses with "jumbo" dependencies. Then we can see how much impact the dependencies have in the first place.

I made a quick stab at this but I suspect that I goofed something because only a very small number of packages were fixed by adding jumbo dependencies, see summary.png of this build (note: restricted to sbcl on x86_64-linux at the moment): http://hydra.nuddy.co/build/385198

I'll need to dig through the logs (also linked) to see what is going on. But let me know if you see anything obviously wrong with my Nix hack for injecting new dependencies: https://github.com/lukego/nix-cl-report/blob/b5768a2ddb155889c3feb58db2aa9ef5579e9ec1/flake.nix#L55-L57.

Hydra can also serve up strace logging for each individual build on that evaluation btw if we go up to the overview at http://hydra.nuddy.co/eval/1493.

Uthar commented 1 year ago

https://github.com/lukego/nix-cl-report/blob/b5768a2ddb155889c3feb58db2aa9ef5579e9ec1/flake.nix#L55-L57

Try doing overrideLispAttrs instead of overrideAttrs

lukego commented 1 year ago

@Uthar Thanks for the tip!

Jumbo dependencies still don't seem to be making very many new packages work. I suspect it is confounded by other problems affecting the same derivations.

I browsed a few logs and I see a lot of permission denied errors on writes into the nix store e.g.

Maybe a lot more packages need the build-with-compile-into-pwd trick? Is there an easy way that I could experiment with making that the default behavior to see if it helps get builds working?

Uthar commented 1 year ago

I browsed a few logs and I see a lot of permission denied errors on writes into the nix store e.g.

Either their builds try to write files, like cl-unicode which creates .txt files, or their dependencies are not compiled to fasls. For example:

;;  building /nix/store/bqm4v5ngpccdqjq54nhpsd3fshq9gwm0-cl-aubio-20200427-git
Error opening #P"/nix/store/94z17van3vk3wsiq4kc8mq2pmqnjmq1i-cffi-0.24.1/src/c2ffi/package-tmpGHU3ALSV.fasl":

means we should add cffi to lispLibs of cl-aubio - also make sure that cffi.systems includes c2ffi

lukego commented 1 year ago

means we should add cffi to lispLibs of cl-aubio - also make sure that cffi.systems includes c2ffi

Looks like cl-aubio.lispLibs does include cffi but cffi.systems does not include c2ffi. So I suppose it's being compiled lazily after being migrated into read-only storage.

Have to think about how to prioritize these failing cases. Generally I'd like to start with "bang for buck" problems that affect a lot of packages and can be solved in a general way. So I don't want to manually add c2ffi as a system of (this particula version of) cffi but it could make sense to look into why this wasn't detected automatically.

Maybe it's time to run the full build and survey the logs again. If the "jumbo" code is working I'd expect a lot of "could not find library" errors to be replaced with something else.

Uthar commented 1 year ago

it could make sense to look into why this wasn't detected automatically.

The reason is that Quicklisp data doesn't include subsystem dependencies (slashy systems), we can't do anything about it. Maybe writing an importer backend straight from repositories, instead of parsing QL systems.txt, is a better long term idea. Using Quicklisp data was just a quick way to get a bunch of packages into Nix.

lukego commented 1 year ago

Maybe writing an importer backend straight from repositories, instead of parsing QL systems.txt, is a better long term idea

So in this scenario would the packaging process look something like this?

Somehow import sources for all relevant packages into the Nix store.
Somehow discover ASDF systems/dependencies by inspecting sources.
Somehow discover native dependencies for each package.
Somehow generate an imported.nix file with build-asdf-system expressions listing the minimal and correct dependencies for each package.

This does sound more complex than the current approach. I suppose it only makes sense if it would fix a bunch of packaging problems. I wonder how we could best estimate that impact based on the logs that Hydra is producing.

Aside: Are there other lisp-cl packaging attempts that work more like this and we could borrow ideas from? I know that importing packages into ql2nix took much longer so I assume it worked differently somehow (but never really understood the machinery.)

Uthar commented 1 year ago

There is a bunch of projects: https://github.com/Uthar/nix-cl#other-nixcl-projects

lukego commented 1 year ago

What's your feeling on this in general @uthar? Do you think it's a logical direction to take the project or is it a detour from other goals e.g. focus on hand-crafting quality packaging of the latest Quicklisp release? Maybe I am getting carried away...

Uthar commented 1 year ago

I think for the short term it's practical to import the Quicklisp data and manually fix up anything that's broken. But the long term goal is to be independent from any such central repository, instead fetching straight from the source repos of packages. In this case Quicklisp can be used to "bootstrap" such a thing.

I'm working on Nix bindings to Common Lisp using Clasp (see example), which I'm planning to use together with ASDF to do the importing and auto-discovery of needed native libraries. It should be good because there will be first-class access to the Nix language interpreter, any package in nixpkgs and build results. Then it would be easier to do any algorithms like you described because one could use Common Lisp to do that.

lukego commented 1 year ago

@Uthar Fancy!

I was hoping the recursive-nix idea would allow us to write all that code in Nix but it seems like I was optimistic. So writing it in Lisp with some sane binding towards Nix does seem to make sense.

I will pause and consider for a bit.

Uthar / nix-cl

Detect native dependencies of Lisp packages #27