NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.99k stars 14.01k forks source link

language package management tooling missing hashes #65275

Open ghost opened 5 years ago

ghost commented 5 years ago

I'm sorry for the very general name of this issue, if anyone can come up with a better title for this problem please make a suggestion :-)

I am looking for the best solution for the following problem that I run into quite a lot: The application I am trying to package has dependencies from it's language's packaging system, that is not tracked in nixpkgs (npm, rubygems, maven, rust crates, ...). There is tooling to adapt the dependency definitions from the language's package management system to nix (yarn2nix, bundix, ..), but since the original dependency definitions don't contain usable hashes for all dependencies, or are missing the hashes for git dependencies, these dependency definitions need to be combined with information from the internet. There are multiple solutions that I see being used:

  1. generating a .nix file describing the dependencies including all needed hashes and including it in the packaging (gemset.nix, yarn.nix)
  2. downloading the dependencies during build time with a fixed-output derivation (jd-gui package, rust crates)

Which one is actually more favorable? The first one results in difficult-to-maintain packages and spams nixpkgs with large files, the second one is not strictly pure, and it is kind of working around nix using the fixed-output derivation, I think I read edolstra discouraging it.

There was some discussion on IRC, but there was no conclusion, so I raise the issue here, because I think there should be a consensus on how to handle this problem in nixpkgs. Maybe there is an even better third approach that I don't know yet.

ghost commented 5 years ago

Approach 2 (ab)uses fixed-output derivations. Why this might be a bad idea is described here: https://github.com/NixOS/nix/issues/2270

zimbatm commented 5 years ago

Glossary:

Here is a brain dump of what I have learned so far:

0. Fixed output derivations

The fixed-output derivation is the level zero support. It doesn't take much effort to create and maintain. The language package manager (LPM) commands can be used directly like in the developer documentation.

The biggest downside is that the hash is not automatically invalidated when one of the input files are changing. This can create surprising situations when the lockfile is updated but the old program is still running (because it's still reading from the old hash). The hash has to be invalidated manually but changing it to something else, run nix-build and wait for nix to tell you the right hash. Then re-run the build from scratch.

Another downside is that not all the tools have a stable on-disk output. Two developers not sharing a binary cache might get different output hashes. I've seen that happen with the cargo tools for a while for example.

1. Fake registry

A lockfile is generated that downloads all the dependencies using nix fetchers. Then the aggregate is used to start a fake registry process that the tools can talk to.

This solves the outdated hash problem, and since the APIs are usually publicly documented they are also pretty stable. The only implementation of that idea that I know of is https://github.com/nmattia/napalm

The biggest downside is that we need to build the API for all the languages.

2. Pre-download the dependencies

This is similar to (1) but instead of providing an API, the files are placed on disk where the LPM expects to find them. There is often an offline mode that we can re-use.

In my experience the on-disk locations are no necessarily documented or stable between releases. To implement it properly, the nix developer is often forced to look at the LPM source code to find the location and duplicate the logic in nix.

3. Build each dependency in it's own derivation

This is going even further than (2) in the LPM integration. The LPM usually controls traversing the dependency tree and running each individual builds, which we hijack and replace with individual derivations that are being built. So here each dependency is built in isolation in it's little sandbox, and then stringed together and presented to the application at the end. Even more heuristic of the LPM is encoded into Nix. Examples of that can be found for ruby and python.

The main advantage of this approach is that it minimized the rebuild between two releases. The rebuilds are more incremental. And in theory the dependencies can also be shared between two or more programs.

The main downside is that it's a lot of nix code that is running to build a single package. The Hydra evaluator is now running on a 64GB node because evaluating nixpkgs takes a lot of memory. And while sharing is nice, in practice ruby/node/rust projects very rarely share exactly the same dependency set.

This is a great approach for a company monorepo. Or when maintaining a package set snapshot like Stackage or the python modules.

For single packages in nixpkgs, I now believe that this is going one step too far.

adisbladis commented 5 years ago

And while sharing is nice, in practice ruby/node/rust projects very rarely share exactly the same dependency set.

Derivations don't need the exact same dependency set for sharing to be useful though, as long as there are some common dependencies. IME sharing is very common even cross-projects, especially in nodejs where some dependencies are in almost every larger project.

I quickly checked duplication across gemset.nix in nixpkgs and this was the result: https://gist.github.com/adisbladis/730b982cc6b1a8013581529639c40ce0 and did a similar check for buildGoModule in https://github.com/NixOS/nix/issues/2270#issuecomment-508768325.

zimbatm commented 5 years ago

:+1: on adding data to what is essentially a belief right now.

I think you have to show that the derivation outputs are the same. For example addressable-2.6.0 is used in 16 projects but has a dependency on public_suffix >= 2.0.2, < 4.0 . There are 5 different versions of public_suffix in the gist so potentially 5 different derivation outputs for the same addressable-2.6.0 gem. Basically leafs are shared each level below rest is exponentially less likely to be.

That being said, the sharing also happens between multiple versions of nixpkgs. Having more granular derivations also allows to minimize rebuild on package updates, and minimize downloads from the user.

To really know we would need a big differential equation that balances build times, evaluation times and download times.

Actually I was missing the last step:

4. Use Nix as a project build system

In this scenario, there is no LPM. Nix has entirely replaced the LPM tooling. Nix is building each an every object of a project in it's own derivation and composing them all together. This is the ultimate incremental rebuild, and the ultimate memory and Nix evaluation hog.

An example of such implementation: https://github.com/nmattia/snack/

At that point you very much wish that Nix had an Intentional Store to minimize rebuilds.

offlinehacker commented 5 years ago

The issue then becomes that you have to put generated nix files in git, and these can be very large. Do you have any ideas how this could be solved? I had some ideas about using git lfs or some content addressable storage, like ipfs.

adisbladis commented 5 years ago

@offlinehacker This could potentially be addressed by nix flakes (& splitting up nixpkgs into subsystems).

offlinehacker commented 5 years ago

Also there's one other option that's variation of fake registry option described above:

Recording/Caching http proxy

You redirect all requests of LPM through local http proxy. This proxy records all requests and transforms in a way that can be later used for reply during installation process. The problem is that package manager not only loads tar archives and git repos, but also makes api requests to something like npm. You need to make response transformations that are specific to each package manager, but the whole service could be generalized with plugins.

During installation process you start proxy again with generated configuration from first step as an input.

The benefit is that you now no longer require fake registry for every package manager but you have more generalized solution.

offlinehacker commented 5 years ago

@adisbladis in any case even if you split repo, you still polute other repos with basically files that are large text blobs, but yeah I agree that this would still help. The problem is we can't package some things because generated files are too large, for example take a look here: https://github.com/NixOS/nixpkgs/pull/49082

zimbatm commented 5 years ago

The issue then becomes that you have to put generated nix files in git, and these can be very large. Do you have any ideas how this could be solved? I had some ideas about using git lfs or some content addressable storage, like ipfs.

The best solution that I know of is to extend the nix capabilities to allow recursive nix calls. Recursive Nix is when nix is being called from inside a derivation. I would look a little bit like this:

stdenv.mkDerivation {
  pname = "xxx";
  version = "1.2.3";
  src = fetchFromGitHub { ... };
  buildPhase = ''
    nix-build -I nixpkgs=${pkgs.path} ./default.nix
     # or ${./inner.nix} if upstream doesn't have a default.nix or we don't want to use it
  '';
  installPhase = ''
    ln -s $(readlink ./result) $out
  '';
}

(obviously we would extract this pattern in a new pkgs.mkRecursive function)

The nice thing here is that import-from-derivation can be allowed in the inner build. It's not going to affect nix-env -qaP. And the lockfile is sourced directly from upstream instead of having to duplicate it in nixpkgs. And if upstream has already packaged the project with nix we can also defer that to them (except the meta and passthru attributes).

So overall it would make hydra builds a bit slower because nixpkgs has to be re-evaluated again on each build. For the users, the nixpkgs evaluation becomes faster because the complicate IFD happens only at build time. If we start using nix files from upstream it might make refactoring of nixpkgs a bit harder. Package dependencies are harder to follow since they are not passed to the outer default.nix.

Mic92 commented 5 years ago

Recursive nix was also discussed in this RFC: https://github.com/NixOS/rfcs/pull/40

It goes even further because it also pre-generates derivations

ghost commented 5 years ago

When there's something I can try and maybe an example of how to use it, I would love to start experimenting with it to get rid of all the yarn.nix files, where possible.

However, I do not see how this could solve the issue for example with ruby tooling, where the hashes are not included in the lockfile.

ghost commented 4 years ago

The topic of how to do language package managers came up in #78810 again.

@Mic92 described another issue that was not considered here, which is time and memory required for evaluating nixpkgs (like when doing nix-review).

I am wondering: Would this issue be solved by recursive nix?

ghost commented 4 years ago

By the way: This pattern could avoid the expression size explosion: https://github.com/NixOS/nixpkgs/pull/87258/files#diff-97ddd5942a260ac035c022c0c57de234R20

Originally posted by @Mic92 in https://github.com/NixOS/nixpkgs/pull/78810#issuecomment-625808078

I think this discussion should not be held in the Mastodon PR, because it is not specific to mastodon or even yarn2nix. The bundler tooling and some Go tooling works the same and has the same issues.

I think this is just moving the problem from expression size / evaluation speed to hidden impurities (see https://github.com/NixOS/nix/issues/2270).

This is a general issue and I think it might even be good to create some kind of working group of people who are interested in finding a community concensus and solving this problem for all language package managers in the long-term. I would certainly be interested in it.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/future-of-npm-packages-in-nixpkgs/14285/3

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info