NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.06k stars 14.11k forks source link

cudaPackages: clean "cudatoolkit" up #173462

Open SomeoneSerge opened 2 years ago

SomeoneSerge commented 2 years ago

Suggested change

Context

Nixpkgs has recently switched to the redist cuda packages (henceforth cudaPackages, as opposed to cudaPackages.cudatoolkit, or cudatoolkit for short), where every individual piece of cuda comes in a separate derivation. The old run-file based cudatoolkit expression, which packs development libraries and GUI apps into the same output, is being phased out. However, transition of downstream packages is slow and we still heavily rely on that old expression. This has far extending ramifications:

The cudatoolkit expression at some point was split into cudatoolkit.out and cudatoolkit.lib exactly for these reasons. However, the existing .lib output only includes cudart which is insufficient to build nearly anything (usually packages additionally ask at least for cublas, cufft, curand). Downstream nixpkgs expressions consume cudatoolkit.out instead, consistently.

Besides, the current expression is already buggy:

In this issue I suggest we focus on the very first item, i.e. the docker images and runtime closures (opposed to frequent rebuilds). Simply splitting nsys and debuggers into their own outputs should amend the issue.

I suggest the change be backported into the previous release

Unaddressed issues, alternatives

With these changes applied, the cudatoolkit.out.outPath computed from the inputs would still change with every fontconfig update, although the contents shouldn't: we're still going to rebuild. Content-addressable mode could amend this?

As an alternative I've tried mass-switching packages to cudatoolkit.lib, but that wasn't meaningful without copying more things into $lib first

CC @NixOS/cuda-maintainers @FRidh

FRidh commented 2 years ago

What is the reason not more packages can be converted to the redist packages? Are parts from cudatoolkit missing? Splitting/removing static outputs is trivial.

SomeoneSerge commented 2 years ago

What is the reason not more packages can be converted to the redist packages? Are parts from cudatoolkit missing?

We'll keep transitioning packages to the redist cuda. This is a slower process, but we can approach the desired goal (for the scope of this issue: "slim docker images") more gradually, by first pruning the old expression a little. That should be pretty easy to merge. That we can also backport, probably unlike the redist packages.

Splitting static is trivial, but establishing a common interface for runfile and redist is not. That is still desirable so that users may switch to older (<11.4) cuda when needed, until these cuda versions have been fully deprecated by the upstream.

Obviously, there's a hierarchy of priorities and just splitting the outputs can and should come before we've agreed on a common interface

samuela commented 2 years ago

I don't use docker personally, but I think the proposed changes make sense. We should always be aiming to keep things as slim as possible IMHO.