cachix / devenv

Fast, Declarative, Reproducible, and Composable Developer Environments
https://devenv.sh
Apache License 2.0
3.56k stars 259 forks source link

cuda: add module #422

Open bobvanderlinden opened 1 year ago

bobvanderlinden commented 1 year ago

As discussed on Discord, this configuration is needed to run pytorch in devenv on Linux. It was confirmed to work.

I don't have much knowledge of CUDA itself, so I'm unsure what other libraries exactly need. I did find that CUDA_HOME and CUDA_PATH are used by tensorflow.

Confirmations that this works for real projects are welcome!

domenkozar commented 1 year ago

We'll need to add toolkit folder to top-level.nix imports.

Is this ever supposed to work on macOS?

bobvanderlinden commented 1 year ago

We'll need to add toolkit folder to top-level.nix imports.

👍

Is this ever supposed to work on macOS?

I don't think so, but not sure 😅 https://developer.nvidia.com/nvidia-cuda-toolkit-11_6_0-developer-tools-mac-hosts Apparently it can be used remotely, so yes the toolkit itself does support macOS. I'm kindof doubting that works for pytorch as it needs libcuda.so which is in the x11 package.

domenkozar commented 1 year ago

Let's add an assertion then if !pkgs.stdenv.isLinux.

bobvanderlinden commented 1 year ago

Requires https://github.com/cachix/devenv/pull/383, as cuda is unfree.

domenkozar commented 1 year ago

Would we want to incorporate any feedback from https://github.com/NixOS/nixpkgs/issues/217780#issuecomment-1442477719? Leaving it for the future is also fine :)

SomeoneSerge commented 1 year ago

Is this ever supposed to work on macOS?

Is CUDA available on MacOS? Darwin hasn't been included in meta.platforms for most of cudaPackages, but it's mostly that people focus on Linux

bobvanderlinden commented 1 year ago

Hmm, this PR is not really ready. It probably shouldn't support MacOS if cuda in nixpkgs doesn't support it. On Discord it was mentioned this method did work for pytorch, but because of LD_LIBRARY_PATH change of pkgs.gcc-unwrapped the Rust compiler breaks down.

I think LD_LIBRARY_PATH is mostly to workaround a pytorch problem and not so much a CUDA problem.

Using the pytorch package from nixpkgs (and thus the nixpkgs cuda maintainers) doesn't play nicely with poetry (pyproject.toml), so there is not a perfect solution yet.

I am interested in looking into this further to get a good setup for cuda + pytorch + rust, but it's not high on my todo list atm.

I can leave this PR in draft to keep the discussion for devenv centralized, but I can also open a new issue if that's more appropriate.

tfmoraes commented 1 year ago

Adding /run/opengl-driver/lib to $LD_LIBRARY_PATH makes CUDA work for me:

> python -c "import torch; print(torch.cuda.is_available())"
True
bobvanderlinden commented 1 year ago

Adding /run/opengl-driver/lib to $LD_LIBRARY_PATH makes CUDA work for me:

> python -c "import torch; print(torch.cuda.is_available())"
True

I think that is a fix that should be in NixOS. It makes no sense for other distros nor MacOS. Because of that, I'm not sure whether it should be in devenv.

SomeoneSerge commented 1 year ago

I think that is a fix that should be in NixOS. @bobvanderlinden

There's no need for that on NixOS: as long as you use a nix-built pytorch, /run/opengl-driver/lib would already be in the binaries' Runpaths

bobvanderlinden commented 1 year ago

Indeed. However, most people want to use pytorch from poetry (having it be part of pyproject.toml). When doing so, you'll run into the LD_LIBRARY_PATH problem, but only on NixOS. Other systems have the OpenGL driver libraries (like Cuda) globally available.

The best of both worlds might be to use poetry2nix instead of poetry itself to make all poetry-defined packages available as Nix packages. That way the torch package can be overridden to link to a different Cuda library explicitly.

It avoids having the need for LD_LIBRARY_PATH as well as pkgs.gcc-unwrapped.lib.

It does have its own downside in that it probably will not work very nicely with poetry cli commands like poetry add. Haven't given this a try yet though, might not be so bad when using direnv properly.

SomeoneSerge commented 1 year ago

Indeed. However, most people want to use pytorch from poetry (having it be part of pyproject.toml)

Dunno, I haven't seen these people 😆

Other systems have the OpenGL driver libraries (like Cuda) globally available

This is not exactly correct. Most other systems do indeed merge all libraries into one location. But the reason their pytorch manages to discover e.g. libcudart.so and through the libcuda.so, is that their python has in its header .interp set to a system-specific path like /lib64/ld-linux-x86-64.so (or something), and that linker is configured to look at /etc/ld.so.conf (or something) which is also a system-specific path. And that ld.so.conf will specifically enumerate system-specific paths like /lib, /usr/lib, and /opt/some-nonsense/cuda/lib. In other words, their libcuda.so is as "globally available" as ours. Having that said, maybe we could make integration easier, at risk of occasionally facing some library version mismatches

lizelive commented 1 year ago

Dunno, I haven't seen these people

i use torch from poetry most of time because cuda libs are in pypi now and fewest number of package mangers the better.

lizelive commented 1 year ago

also because ml packages are updating so fast it's not viable to do with nix system packages

domenkozar commented 10 months ago

Could we somehow detect if opengl stuff is wired up and error out with a nice message what to do?