conda-forge / conda-forge.github.io

The conda-forge website.
https://conda-forge.org
BSD 3-Clause "New" or "Revised" License
127 stars 274 forks source link

How to specify CUDA version in a conda package? #687

Open mrocklin opened 5 years ago

mrocklin commented 5 years ago

How should a package maintainer specify a dependency on a specific CUDA version like 9.2 or 10.0?

As an example, here is how PyTorch does things today:

I believe that NVIDIA and Anaconda handle things differently. I have zero thoughts on which way is correct, but I thought it would be useful to start such a conversation around this. My hope is that we can come to some consensus on packaging conventions that can help users avoid broken environments more easily and provide a good pattern for future package maintainers to follow.

cc @jjhelmus @msarahan @nehaljwani @stuartarchibald @seibert @sklam @soumith @kkraus14 @mike-wendt @datametrician

mrocklin commented 5 years ago

Also for history referencing https://github.com/conda-forge/conda-forge.github.io/issues/63

Also my apologies for misusing the conda-forge issue tracker for this. It's clearly not explicitly a conda-forge problem, but this seems to be a good place to have a community discussion. (also, it's what @msarahan suggested ;))

msarahan commented 5 years ago

What you're seeing with cuda100 and such is that people are creating packages as stand-ins for constraints. That's fine as a short-hand. It is important to understand why versions must be specified. Ultimately, this is all about compatibility ranges with CUDA. If something is built against CUDA 9, will it work with CUDA 10 runtimes? I don't know. Conda-build has a way to help manage that (https://conda.io/docs/user-guide/tasks/build-packages/variants.html#customizing-compatibility)

One other hard aspect to this CUDA stuff is that we can ship CUDA runtimes, but we can't alter the graphics driver that users have. This seriously hampers any flexibility that we have in distributing newer runtimes, and requires that the user understand what their system is currently compatible with in a way that is not generally a problem with other software. Some addition of hardware/driver version detection to conda itself, and having conda choose appropriate CUDA versions would be helpful so that users don't need to figure these things out.

CJ-Wright commented 5 years ago

Would it be possible in the short term to print some warning about driver compatibility when linking cuda things? Bonus points if we can do some light inspection of the system state and give a more detailed message?

seibert commented 5 years ago

CUDA drivers (the part that conda cannot install) are backward compatible with applications compiled with older versions of CUDA. So, for example, the CUDA 9.2 build of PyTorch would only require that CUDA >= 9.2 is present on the system. This backward compatibility also extends to the cudatoolkit (the userspace libraries supplied by NVIDIA which Anaconda already packages), where a conda environment with cudatoolkit 8.0 would work just fine with a system that has the CUDA 9.2 drivers.

So, on one hand, there is motivation (much like glibc) to pick and arbitrary old CUDA and build everything with that, and rely on driver backward compatibility. Aside from new CUDA language features (which project may choose to ignore for compatibility reasons), building with newer CUDA versions can also improve performance as well as add native support for newer hardware. A package compiled for CUDA 8 will not run on Volta GPUs without a lengthy JIT recompilation of all the CUDA functions in the project, which happens automatically, but can still be a bad user experience. As an example, TensorFlow compiled with CUDA 8 can take 10+ minutes to start up on a Volta GPU.

These two conflicting desires for compatibility and performance explain why it makes sense to compile packages with a range of CUDA versions (right now, I'd say 8.0-10 or 9.0 to 10.0 would be the best choice), but still leaves the burden on the user to know which CUDA version they need.

Because nearly all CUDA projects require the CUDA toolkit libraries, and Anaconda packages them, we use the cudatoolkit package as our CUDA version marker. So for packages in Anaconda that require CUDA, we make them depend on a specific cudatoolkit version. This allows you to force a specific CUDA version this way:

conda install pytorch cudatoolkit=8.0

And that will get you a PyTorch compiled with CUDA 8, rather than something else.

The CUDA driver provides a C API to query what maximum version of CUDA is supported by the driver, so a few months ago I wrote a self-contained Python function for detecting what version of CUDA (if any) is present on the system:

https://gist.github.com/seibert/52a204395cdc84eeeaf0ce05464a636b

This was for the conda team to potentially incorporate into conda as a "marker" (I think that is the right term), so that conda could include a cuda package with a version given by this function in the dependency solver. That would then give everyone a standard way to refer to the system CUDA dependency.

I don't know where this work is on the roadmap for conda (@msarahan?), but if there is additional work needed on the conda side to get this to the finish line, I'm happy to help. It would go a long way toward unifying the various approaches as well as improving the user experience.

mrocklin commented 5 years ago

@soumith can you (or someone else who works on PyTorch who might be more engaged on this topic) comment on if depending on cudatoolkit versions would work for you rather than signaling versions with empty cudaXX packages?

@mike-wendt and @kkraus14 does the approach above work for you for RAPIDS?

soumith commented 5 years ago

Depending on cudatoolkit=X would work for us. Atleast from the surface, it doesn't look like there would be any blockers.

Incidentally, when we started the feature-tracking hack, i.e. conda install pytorch cuda80, cudatoolkit wasn't around.

I'll look to moving our packages to using this format / convention.

Thanks for starting this conversation @mrocklin

datametrician commented 5 years ago

@soumith Thanks for jumping in.

We've been using labels in the RAPIDS project and it's been helpful so far. Should we consider that as well? I'm not sure conda should be the only way to install the CTK even for conda users. With labels people can pull the package they need for the right CUDA version, and if they also want to install CTK from conda they have the option.

seibert commented 5 years ago

It's important to note that labels (I assume you mean things like this) are properties of a conda package in a particular channel, and not intrinsic metadata for the package itself.

Labels are a good way to separate packages for different purposes (for example, dev, qa, release), but the labels have no impact on the conda dependency solver. This means it would be possible for a user to mix the CUDA 9.2 version of cudf (using the cuda9.2 label on the rapidsai channel) with the CUDA 10 version of PyTorch (presumably downloaded from some other channel). Ultimately we want the conda solver to be aware of what CUDA version is required to prevent this situation, whether that be through cudatoolkit package versions, empty packages, or a cuda marker computed by conda itself.

mrocklin commented 5 years ago

Do RAPIDS libraries depend on the cudatoolkit conda package in the defaults channel currently?
Is this something that they would be willing to do, or is there some other proposed convention?

scopatz commented 5 years ago

One thing we really need is for conda-forge/NumFOCUS to be able to redistribute cudatoolkit at all. @mrocklin @datametrician, any updates on this front?

seibert commented 5 years ago

My understanding of the CUDA toolkit EULA is that the libraries--which is what the cudatoolkit conda package that Anaconda ships--are redistributable. (The EULA in fact enumerates exactly what files can be redistributed.) The compiler is not redistributable, but that isn't needed by end users.

However, cuDNN (used by all the deep learning frameworks) is still shipped separately from the CUDA toolkit and technically requires an NVIDIA developer registration. Anaconda obtained a special license from NVIDIA to redistribute it in the Anaconda Distribution, but (IMHO) the registered developer requirement on cuDNN should be lifted so it can be redistributed on the same terms as the rest of the CUDA toolkit libraries.

datametrician commented 5 years ago

@scopatz I believe Pramod Ramarao already responded to you in an email on 11/15 essentially saying what @seibert said.

Stan, I agree with you on cuDNN, and will point Pramod and the cuDNN PMs to this thread. Let's see if that will change anything.

@mrocklin I believe this is doable from our end, especially since everyone is essentially doing it there own way and this would allow some consolidation. I'll let @mike-wendt and @kkraus14 chime in though.

seibert commented 5 years ago

Also, I'm not sure what the status is of NCCL2. Is it also limited to registered developers?

datametrician commented 5 years ago

NCCL2 is open source and free to redistribute. The source code is also on github now.

mike-wendt commented 5 years ago

Do RAPIDS libraries depend on the cudatoolkit conda package in the defaults channel currently?

Right now we do not rely on cudatoolkit as that creates an dependency on how CUDA is installed. Many of our devs are using their own versions of CUDA that they have installed to /usr/local/cudaThis also includes internal folks who are using nightly CUDA builds for testing and bleeding edge development.

Is this something that they would be willing to do, or is there some other proposed convention?

I'm not against standardizing around cudatoolkit; however, my concern here is this complicates things for the users above. As this overwrites their system install with the conda version. In addition, as we have run into issues with RAPIDS, we need not just the toolkit but also the system level NVIDIA drivers. This takes us out of user-space where conda shines at management, and into system-level kernel modules.

Certain libraries are distributed with the GPU driver that we depend on, and for systems that do not have this installed the RAPIDS libraries fail to build. So we have more of a CUDA and NVIDIA Driver dependency. While it sounds like we're introducing more complexity, we handle this by restricting the level of CUDA we support in RAPIDS. The reason we rely on this approach is due to the fact that each version of CUDA has a minimum NVIDIA driver version that it needs to operate. If that driver version is not satisfied, then the installation fails and the user usually upgrades their driver to a compatible version.

I want to be clear though that we should not make this a true dependency; that is require a driver version that matches the cudatoolkit to be installed. Perfect use case is Travis CI and other CPU-only build environments, that need the toolkit and libraries for compiling and linking, but do not need the driver. Establishing a true dependency will require the installation of the NVIDIA driver on those systems when they may not be needed.

Thoughts

Questions

mrocklin commented 5 years ago

Allow users to leverage their own CUDA installations - my guess is this package would sym-link to /usr/local/cuda and setting the appropriate paths, LD_LIBRARY_PATH, etc

This is an interesting question. Presumably this happens in other disciplines today? If I'm working on a new BLAS implementation is there some way for me to link a conda-installed numpy to my version of BLAS (or really any version other than OpenBLAS and MKL) or am I outside of conda-supported workloads and I should be handling things on my own at this point?

I'm curious about your questions regarding mirroring @mike-wendt . What packages would you mirror and how would this solve the problem of working with bleeding edge cudatoolkit installs? Would your mirrored package just be an empty cudatoolkit that allows the locally installed version to come through?

mike-wendt commented 5 years ago

This is an interesting question. Presumably this happens in other disciplines today? If I'm working on a new BLAS implementation is there some way for me to link a conda-installed numpy to my version of BLAS (or really any version other than OpenBLAS and MKL) or am I outside of conda-supported workloads and I should be handling things on my own at this point?

I think from our in-person discussion this is "outside of conda-supported workloads." The target user for RAPIDS, pytorch, and others using CUDA are just that "users." They primarily want a way to get up and running quickly instead of trying to figure out dependencies. Standardizing around cudatoolkit across all projects would help this effort.

For the rest they are developers which comes with some work. They need to be aware of the tools and how to use them, so they can verify and test approaches for users using conda, but not totally dependent on them for development. Your BLAS example is a good one, as is RAPIDS, which needs the full CUDA development install for compilers and includes. Not to mention our unique case where we are testing nightly builds and need a process outside of conda for that. Could we move that to conda and publish nightly packages privately? Sure, but I don't believe it will bring the value I thought it would previously.

I'm in favor of using cudatoolkit as a standard to help enable the dependency resolution and help users stay in "user space" without having to deal with system stuff besides an NVIDIA driver.

  1. Use cuda## in the build string of packages to clearly identify the dependency of the package like py##
    • To do this we need to set a standard as I have seen three conventions used in the wild to mark versions:
      • cuda92 and cuda100 - omitting the decimal
      • cuda9.2.148 and cuda10.0.130 - full version number
      • 9.2 & 10.0 or cuda9.2 & cuda10.0 - major/minor version number
    • cudatoolkit specifies versions as major/minor M.A so we may want to match that but use cuda prefix to identify it separately from other conventions like py36
  2. Label packages with the supported versions of CUDA with main for all others use dev
    • This is to try and address projects that may support all CUDA versions and those that only support from version X+ or between versions X-Y
    • Example here is some libraries support all versions of CUDA, but RAPIDS only supports CUDA 9.2. The dependency on RAPIDS should have the solver pull at least CUDA 9.2 for the other projects unles they all support a higher version then it will grab that one
    • The dev label allows for community use and testing of newer versions of CUDA, but to also act as a means of ensuring stability for users
  3. Work with Anaconda to get CUDA 10.0 out ASAP
    • The anaconda channel is still on 9.2 (lower on some)
    • This is the top search result for cudatoolkit so we need to get these packages updated to allow the community to move forward
  4. Communicate upcoming CUDA releases to Anaconda faster so we can have a new cudatoolkit on the first day there is a CUDA release
  5. Investigate the need for additional cudatoolkit packages
    • Examining the DockerHub for CUDA we see that there are three flavors of CUDA environments: base, devel, runtime
    • It looks like we have runtime through cudatoolkit
    • base I'm not certain we would need unless
    • devel comes from conda-forge/cudatoolkit-dev
      • Not sure if cudatoolkit-dev belongs also in Anaconda or should stay in Conda-Forge
      • If we move it to Anaconda we may be able to leverage the agreement with NVIDIA to distribute CUDA and create a proper package similar to cudatoolkit
      • Right now cudatoolkit-dev downloads the CUDA installer and runs it in the conda environment, it works but it could be better
mrocklin commented 5 years ago

OK, it seems like everyone is on board with specifying CUDA version numbers by expressing dependencies on cudatoolkit.

It also sounds like @mike-wendt is proposing a convention around including the cuda version number in build strings. Is there any objection to this?

@mike-wendt would the next release of RAPIDS follow this convention, or is that too early for you?

@soumith , what does this process look like on the PyTorch side? Is it easy for you all to change around your builds and your installation instructions? I can imagine that you would want to have some sort of smooth transition.

mike-wendt commented 5 years ago

would the next release of RAPIDS follow this convention, or is that too early for you?

@mrocklin The blocker for us is the lack of a cudatoolkit for CUDA 10.0. If we can get it by next week we may be able to include this in v0.5.

The major concern I have this week and next is the Conda-Forge plan for gcc7 switchover that occurs on 1/15. So I think it is safe to say a lot of us will be busy that week dealing with the conversions and any necessary updates related to that primarily.

Right now we are scheduled to freeze for v0.5 on 1/16 so I think it will be hard to guarantee that it makes it this release, but we might be able to do a hotfix release the week after.

mrocklin commented 5 years ago

There is no immediate stress on this. Happy to play a long game. So the first time that RAPIDS would use this convention would be sometime in March?

soumith commented 5 years ago

@mrocklin process on PyTorch side is easy, we just have to change our build scripts. I'm inclined to change when CUDA10 cudatoolkit is available as well, because otherwise half of our install commands are via feature packages cuda100, and other half will be around cudatoolkit=9.0 etc.

stuartarchibald commented 5 years ago

CUDA 10.0 cudatoolkit recipe is live https://github.com/numba/conda-recipe-cudatoolkit.

msarahan commented 5 years ago

I had a talk with @seibert yesterday about what he thinks he needs from conda to support this. I think we agreed that conda needs "virtual packages" which @kalefranz has been lumping in with "markers" but which I think are actually separate.

A virtual package is something that represents some aspect of the system. Its version and build string can be dynamically determined by having conda run some code for that particular virtual package. It would then be considered in the solver as a package with a strict pinning.

For cuda, it means that we need to decide what this package name should be. Then all packages would express their CUDA compatibility as normal dependencies on that package.

A user's system may present something like a dependency of:

cuda=10.0=sub-build-id

while packages such as pytorch should express normal version dependencies like:

cuda >=10,<11.0a0

(adjusted as appropriate for the actual compatibility expectations of cuda)

Conda could obviously never update cuda, but it would be nice to have it recognize ways outside of its control to update (i.e. tell the user that they can update their driver or upgrade their hardware). Depending on the time it takes for this cuda virtual package to represent itself, it may be something that we cache on disk and have a "refresh"-type command.

@seibert volunteered some time towards getting this implemented in conda. We'll hope to have something ready soon - likely with the next minor release of conda, 4.7.0.

mrocklin commented 5 years ago

@msarahan to be clear it sounds like you're proposing this as an alternative to using cudatoolkit to represent CUDA version dependency, correct?

msarahan commented 5 years ago

could be? I'm ambivalent on that. If you can ship runtimes that work with a variety of drivers, maybe they can be independent.

msarahan commented 5 years ago

Or should I say: maybe cudatoolkit stays in usage the same as it is now, but cudatoolkit itself grows a dependency to this new virtual package to establish driver requirements.

mrocklin commented 5 years ago

OK, so you think that it's still the right approach for downstream packages to depend on cudatoolkit today, and that in the future conda might do some work to auto-detect the cuda version on the system so that users don't have to specify it themselves.

msarahan commented 5 years ago

Yep, definitely important to bridge the gap with cudatoolkit, since new conda versions may take a while to be available. Perhaps cudatoolkit can be dropped in the more distant future when this new approach is proven and commonly available. Thankfully, I expect the CUDA-using community will be quick adopters of new conda versions, rather than laggards holding onto old versions.

mrocklin commented 5 years ago

@msarahan thanks!

CUDA 10.0 cudatoolkit recipe is live https://github.com/numba/conda-recipe-cudatoolkit.

Is there something we need to do to get this into defaults, or is it in the pipeline already?

jjhelmus commented 5 years ago

Is there something we need to do to get this into defaults, or is it in the pipeline already?

I will work on getting cudatoolkit 10.0 into defaults next week.

datametrician commented 5 years ago

@jjhelmus any update on cudatoolkit 10?

jjhelmus commented 5 years ago

A cudatoolkit 10.0.130 package is available in defaults for linux-64. I've been running into some issues with the Windows package but expect to have it available soon.

jjhelmus commented 5 years ago

A cudatoolkit 10.0.130 package is available for win-64 now.

soumith commented 5 years ago

@jjhelmus the cudatoolkit packages have inconsistent versioning. we have cudatoolkit=9.0 cudatoolkit=9.2, but cudatoolkit=10.0 doesn't exist, it's instead the full version string cudatoolkit=10.0.130. Could you help fix that.

jjhelmus commented 5 years ago

The addition of the micro version was intentional. NVIDIA labels CUDA releases with a micro version and I think in the past has released multiple micro versions for a given major.minor version. With the previous cudatoolkit packages there was not method to differentiate these changes. The addition of the micro version to cudatoolkit 10.0.130 is more specific and allows for updates if a new micro version is released. Package builder and users should still specify the version by the major.minor version, e.g. conda install cudatoolkit=10.0, conda will automatically provide the micro version.

soumith commented 5 years ago

okay, I've worked around this. In my refactored recipe, I was providing cudatoolkit==10.0, and that was ending up depending on cudatoolkit==10.0.130, and was refusing to install if I specified conda install pytorch cudatoolkit=10.0. I've worked around it by specifying cudatoolkit >=10.0,<10.1 in the recipe "runtime" dependencies instead.

soumith commented 5 years ago

On my side, the conda install pytorch cuda100 -c pytorch business should go away with the release of pytorch v1.0.1. We are moving towards: conda install pytorch cudatoolkit=10.0 -c pytorch.

Thanks all for the thread.

jjhelmus commented 5 years ago

Alternatively, you can use {{ pin_compatible('cudatoolkit', max_pin='x.x') }} in the requirements/build section of the recipe to have conda build generate the run requirement from the version of cudatoolkit specified in the requirements/host section. This can be helpful if the same recipe is used to build packages for cudatoolkit versions,

mrocklin commented 5 years ago

Does Anaconda also handle the builds for cupy in the defaults channel? If so could the convention layed out here be used for those packages as well?

seibert commented 5 years ago

conda/conda#8267 will add support for a cuda (or maybe @cuda) virtual package that autodetects the version of CUDA supported by the graphics driver.

seibert commented 5 years ago

@jjhelmus builds the cupy packages, I think. They should already depend on the cudatoolkit package, AFAIK.

jjhelmus commented 5 years ago

The cupy packages on defaults depend on the cudatoolkit package. Their builds strings do not include the cuda version but I will add that for the next release.

jakirkham commented 5 years ago

To help codify this a bit more, I've put up PR ( https://github.com/conda-forge/docker-images/pull/93 ) and PR ( https://github.com/conda-forge/staged-recipes/pull/8229 ). These provide a Docker image (based off conda-forge's current Docker image) for compiling packages and a shim package to get NVCC and conda-build to talk to each other. Please share your thoughts on these.

jakirkham commented 5 years ago

Something else worth mentioning here. I've noticed that CMake when using the CUDA language feature often likes to statically link to the CUDA runtime library. There use to be a way to disable this (e.g. CUDA_USE_STATIC_CUDA_RUNTIME), but it is part of the deprecated FindCUDA module and doesn't work with the newer, preferred CUDA language feature. This will result in some package bloat if all CMake-based CUDA packages are doing this static linking and we are shipping cudatoolkit along with. There appears to be an open issue in CMake to fix this. I'm not sure how painful this is for people yet, but wanted to raise awareness if package size is an issue.

ref: https://gitlab.kitware.com/cmake/cmake/issues/17559