Open SomeoneSerge opened 2 years ago
What is the reason not more packages can be converted to the redist packages? Are parts from cudatoolkit missing? Splitting/removing static outputs is trivial.
What is the reason not more packages can be converted to the redist packages? Are parts from cudatoolkit missing?
We'll keep transitioning packages to the redist cuda. This is a slower process, but we can approach the desired goal (for the scope of this issue: "slim docker images") more gradually, by first pruning the old expression a little. That should be pretty easy to merge. That we can also backport, probably unlike the redist packages.
Splitting static is trivial, but establishing a common interface for runfile and redist is not. That is still desirable so that users may switch to older (<11.4
) cuda when needed, until these cuda versions have been fully deprecated by the upstream.
Obviously, there's a hierarchy of priorities and just splitting the outputs can and should come before we've agreed on a common interface
I don't use docker personally, but I think the proposed changes make sense. We should always be aiming to keep things as slim as possible IMHO.
Suggested change
pytorchWithCuda
) do not include gtk&c; ensure reasonably slim docker imagesautoPatchelf
instead of the hard-coded$rpath
. TheautoPatchelf
hook validates rpaths againstDT_NEEDED
, ensuring a higher level of correctness than we offer nownsys-ui
,cuda-gdb
, or more (until no curses/gtk needed) out ofcudatoolkit.out
into separate outputs. The default output must not include the GUI or TUI appsnixos-21.11
s.t. last two releases can be used to build sensible docker images[ ] Step 3: Split dynamic and static outputs for redist
cudaPackages
.Rationale: if we were to transition e.g.
pytorchWithCuda
to redist packages now without fixing this first, its runtime closure would include static.a
archives for cublas, et cetera. This is bogus behaviour.cudaPackages
. Add a way to populatecudaPackages
withcudatoolkit
contents, to be consumed following the same interface as redist packages. Add a utility function to merge selected outputs (cf.cudatoolkit_joined
pattern in pytorch/tf expressions). Ensure that the utility function can be used both to merge either runfile-basedcudatoolkit
's outputs or the redistcudaPackages
, using identical interface. This way expressions that have transitioned to redist packages can be rewired by users to build withcuda<11.4
Context
Nixpkgs has recently switched to the redist cuda packages (henceforth
cudaPackages
, as opposed tocudaPackages.cudatoolkit
, orcudatoolkit
for short), where every individual piece of cuda comes in a separate derivation. The old run-file basedcudatoolkit
expression, which packs development libraries and GUI apps into the same output, is being phased out. However, transition of downstream packages is slow and we still heavily rely on that old expression. This has far extending ramifications:pytorchWithCuda
needlessly include ALSA, Xorg, fontconfig, &c libraries. Importantly,dockerTools.buildLayeredImage
packingpytorchWithCuda
necessarily includes these libraries too:pytorchWithCuda
pytorchWithCuda
andtensorflowWithCuda
needlessly rebuild something likefontconfig
updatesThe
cudatoolkit
expression at some point was split intocudatoolkit.out
andcudatoolkit.lib
exactly for these reasons. However, the existing.lib
output only includescudart
which is insufficient to build nearly anything (usually packages additionally ask at least for cublas, cufft, curand). Downstream nixpkgs expressions consumecudatoolkit.out
instead, consistently.Besides, the current expression is already buggy:
/nix/store/s1dr6vd3f2wvvxf2zynszga69znpqnyh-cudatoolkit-11.6.1/bin/.nvvp-wrapped: line 3: /nix/store/s1dr6vd3f2wvvxf2zynszga69znpqnyh-cudatoolkit-11.6.1/bin/../libnvvp/nvvp: No such file or directory
)In this issue I suggest we focus on the very first item, i.e. the docker images and runtime closures (opposed to frequent rebuilds). Simply splitting
nsys
and debuggers into their own outputs should amend the issue.I suggest the change be backported into the previous release
nsys
andcuda-gdb
are leaf programs used interactively, i.e. their few users can easily adjust;Unaddressed issues, alternatives
With these changes applied, the
cudatoolkit.out.outPath
computed from the inputs would still change with everyfontconfig
update, although the contents shouldn't: we're still going to rebuild. Content-addressable mode could amend this?As an alternative I've tried mass-switching packages to
cudatoolkit.lib
, but that wasn't meaningful without copying more things into$lib
firstCC @NixOS/cuda-maintainers @FRidh