Closed samuela closed 2 years ago
How to best migrate existing packages onto the new format? Perhaps we could have a cudatoolkit_11_6 package that just combines a bunch of small ones to emulate the old behavior?
We can try to generate a combined package but I think we can just keep the classic behemoth for the packages not yet migrated.
How to package cuDNN? It doesn't seem to be included in /redist/ anywhere.
What is the issue with the current cudnn packages we fetch? Are those too big as well?
We can try to generate a combined package but I think we can just keep the class behemoth for the packages not yet migrated.
Yeah, that's fine as well. Just so long as we have a way to migrate people over.
What is the issue with the current cudnn packages we fetch? Are those too big as well?
It is rather large and unwieldy. I just realized though that it's already packaged in a somewhat sensible way (expanding and patching a tgz file). See eg https://cs.github.com/NixOS/nixpkgs/blob/a8f938c15c84df4bef8e920fac71cd876188fa9e/pkgs/development/libraries/science/math/cudnn/generic.nix.
It would be really nice if we could make cudnn, cutensor, etc be sub-packages of cudatoolkitPackages
. That way you'd always get a consistent package combo without having to go through the fuss of using things like "cudnn_8_1_cudatoolkit_10_2".
By the way, I feel like I've been too often linking to this comment as a reference for the cudaPackages
approach to ensuring consistency of cuda-cudnn versions between packages. Maybe it deserves a separate proposal issue
Mmm yeah that's not a bad idea... Do you have a workaround in mind that could resolve cudnn/cudatoolkit mismatches?
What @FRidh describes in the second part of https://github.com/NixOS/nixpkgs/pull/166784#issuecomment-1086667289 I understand it this way:
# all-packages.nix
# remove the old cudaPackages (old different semantics: the result contains different versions of cuda)
# remove cutensorPackages
# remove cudnnPackages
# new semantics: the result contains a single version of cuda, a single version of cudnn, a single version of cutensor, all mutually compatible
cudaPackages = callPackage .../cuda-packages.nix { cudaVersion = "11.4"; cudnnVersion = "8.3"; cutensorVersion = "1.3.1.3"; };
And either
# .../pytorch/default.nix
{ ...
, cudaPackages
}: buildPythonPackage {
# ...
nativeBuildInputs = [
# ...
cudaPackages.nvcc
];
buildInputs = [
# ...
cudaPackages.cudatoolkit
cudaPackages.cudnn
];
}
# .../overlay.nix
final: prev: {
# change default cudaPackages and thus rebuild pkgs.pytorch
cudaPackages = prev.cudaPackages.override { ... };
# or just make a custom build of pytorch
myPytorch = prev.pytorch.override { cudaPackages = ...; };
}
Or .../pytorch/default.nix
stays unchanged and
# python-packages.nix
pytorch = callPackageWith (pkgs // pkgs.cudaPackages) .../pytorch/default.nix { };
...in the former case the pytorch derivation gets too many inputs (complexity), in the second the overlay user suffers a bit
Now that I posted this, I see it addresses the problem only partially, and doesn't really eliminate the need for assertions
Just to keep track of this. This:
The corresponding nar file is >2GB which https://github.com/NixOS/nixos-org-configurations/issues/207
...is orthogonal to the current PR and will need to be addressed later.
Current status (./.
refers to the checked out PR):
❯ nix path-info --impure --expr '(import <nixpkgs-unstable> { config.allowUnfree = true; }).cudatoolkit_11_5' -hs
querying info about missing paths/nix/store/qcf89ad9lgaipyy97mn9fdcimx40zn5g-cudatoolkit-11.5.0 4.0G
nixpkgs on cudatoolkit-redist [$] via ❄️ impure (nix-shell)
❯ nix path-info --impure --expr '(import ./. { config.allowUnfree = true; }).cudatoolkit' -hs
querying info about missing paths/nix/store/4q5swpzp1qxbid4p02ksxlhi903ng0hv-cudatoolkit-11.5.0 4.0G
The corresponding nar file is >2GB which https://github.com/NixOS/nixos-org-configurations/issues/207
Once we get everyone off of cudaPackages.cudatoolkit
and switched over to using the redist packages that issue ought to be resolved IIUC.
Issue description
Right now
cudatoolkit
is a truly behemoth package with just about every possible CUDA tool under the sun. It is packaged by downloading the.run
-file based installer, running it, and then copying out the results. This presents a few challenges:Proposal
Create separate packages for each of the tools, eg.
cudatoolkit_11_6_cupti
,cudatoolkit_11_6_nvcc
, etc.@kmittman from Nvidia kindly pointed me to https://developer.download.nvidia.com/compute/cuda/redist/ which would make packaging these pieces individually much easier.
TODO
This issue is intended to be a proposal for discussion/brainstorming how to best proceed. Some open questions in my mind:
cudatoolkit_11_6
package that just combines a bunch of small ones to emulate the old behavior?cc @NixOS/cuda-maintainers @knedlsepp