Open mrocklin opened 5 years ago
Also for history referencing https://github.com/conda-forge/conda-forge.github.io/issues/63
Also my apologies for misusing the conda-forge issue tracker for this. It's clearly not explicitly a conda-forge problem, but this seems to be a good place to have a community discussion. (also, it's what @msarahan suggested ;))
What you're seeing with cuda100 and such is that people are creating packages as stand-ins for constraints. That's fine as a short-hand. It is important to understand why versions must be specified. Ultimately, this is all about compatibility ranges with CUDA. If something is built against CUDA 9, will it work with CUDA 10 runtimes? I don't know. Conda-build has a way to help manage that (https://conda.io/docs/user-guide/tasks/build-packages/variants.html#customizing-compatibility)
One other hard aspect to this CUDA stuff is that we can ship CUDA runtimes, but we can't alter the graphics driver that users have. This seriously hampers any flexibility that we have in distributing newer runtimes, and requires that the user understand what their system is currently compatible with in a way that is not generally a problem with other software. Some addition of hardware/driver version detection to conda itself, and having conda choose appropriate CUDA versions would be helpful so that users don't need to figure these things out.
Would it be possible in the short term to print some warning about driver compatibility when linking cuda things? Bonus points if we can do some light inspection of the system state and give a more detailed message?
CUDA drivers (the part that conda cannot install) are backward compatible with applications compiled with older versions of CUDA. So, for example, the CUDA 9.2 build of PyTorch would only require that CUDA >= 9.2 is present on the system. This backward compatibility also extends to the cudatoolkit (the userspace libraries supplied by NVIDIA which Anaconda already packages), where a conda environment with cudatoolkit 8.0 would work just fine with a system that has the CUDA 9.2 drivers.
So, on one hand, there is motivation (much like glibc) to pick and arbitrary old CUDA and build everything with that, and rely on driver backward compatibility. Aside from new CUDA language features (which project may choose to ignore for compatibility reasons), building with newer CUDA versions can also improve performance as well as add native support for newer hardware. A package compiled for CUDA 8 will not run on Volta GPUs without a lengthy JIT recompilation of all the CUDA functions in the project, which happens automatically, but can still be a bad user experience. As an example, TensorFlow compiled with CUDA 8 can take 10+ minutes to start up on a Volta GPU.
These two conflicting desires for compatibility and performance explain why it makes sense to compile packages with a range of CUDA versions (right now, I'd say 8.0-10 or 9.0 to 10.0 would be the best choice), but still leaves the burden on the user to know which CUDA version they need.
Because nearly all CUDA projects require the CUDA toolkit libraries, and Anaconda packages them, we use the cudatoolkit
package as our CUDA version marker. So for packages in Anaconda that require CUDA, we make them depend on a specific cudatoolkit
version. This allows you to force a specific CUDA version this way:
conda install pytorch cudatoolkit=8.0
And that will get you a PyTorch compiled with CUDA 8, rather than something else.
The CUDA driver provides a C API to query what maximum version of CUDA is supported by the driver, so a few months ago I wrote a self-contained Python function for detecting what version of CUDA (if any) is present on the system:
https://gist.github.com/seibert/52a204395cdc84eeeaf0ce05464a636b
This was for the conda team to potentially incorporate into conda as a "marker" (I think that is the right term), so that conda could include a cuda
package with a version given by this function in the dependency solver. That would then give everyone a standard way to refer to the system CUDA dependency.
I don't know where this work is on the roadmap for conda (@msarahan?), but if there is additional work needed on the conda side to get this to the finish line, I'm happy to help. It would go a long way toward unifying the various approaches as well as improving the user experience.
@soumith can you (or someone else who works on PyTorch who might be more engaged on this topic) comment on if depending on cudatoolkit
versions would work for you rather than signaling versions with empty cudaXX
packages?
@mike-wendt and @kkraus14 does the approach above work for you for RAPIDS?
Depending on cudatoolkit=X
would work for us.
Atleast from the surface, it doesn't look like there would be any blockers.
Incidentally, when we started the feature-tracking hack, i.e. conda install pytorch cuda80
, cudatoolkit
wasn't around.
I'll look to moving our packages to using this format / convention.
Thanks for starting this conversation @mrocklin
@soumith Thanks for jumping in.
We've been using labels in the RAPIDS project and it's been helpful so far. Should we consider that as well? I'm not sure conda should be the only way to install the CTK even for conda users. With labels people can pull the package they need for the right CUDA version, and if they also want to install CTK from conda they have the option.
It's important to note that labels (I assume you mean things like this) are properties of a conda package in a particular channel, and not intrinsic metadata for the package itself.
Labels are a good way to separate packages for different purposes (for example, dev, qa, release), but the labels have no impact on the conda dependency solver. This means it would be possible for a user to mix the CUDA 9.2 version of cudf
(using the cuda9.2
label on the rapidsai channel) with the CUDA 10 version of PyTorch (presumably downloaded from some other channel). Ultimately we want the conda solver to be aware of what CUDA version is required to prevent this situation, whether that be through cudatoolkit
package versions, empty packages, or a cuda
marker computed by conda itself.
Do RAPIDS libraries depend on the cudatoolkit
conda package in the defaults
channel currently?
Is this something that they would be willing to do, or is there some other proposed convention?
One thing we really need is for conda-forge/NumFOCUS to be able to redistribute cudatoolkit at all. @mrocklin @datametrician, any updates on this front?
My understanding of the CUDA toolkit EULA is that the libraries--which is what the cudatoolkit
conda package that Anaconda ships--are redistributable. (The EULA in fact enumerates exactly what files can be redistributed.) The compiler is not redistributable, but that isn't needed by end users.
However, cuDNN (used by all the deep learning frameworks) is still shipped separately from the CUDA toolkit and technically requires an NVIDIA developer registration. Anaconda obtained a special license from NVIDIA to redistribute it in the Anaconda Distribution, but (IMHO) the registered developer requirement on cuDNN should be lifted so it can be redistributed on the same terms as the rest of the CUDA toolkit libraries.
@scopatz I believe Pramod Ramarao already responded to you in an email on 11/15 essentially saying what @seibert said.
Stan, I agree with you on cuDNN, and will point Pramod and the cuDNN PMs to this thread. Let's see if that will change anything.
@mrocklin I believe this is doable from our end, especially since everyone is essentially doing it there own way and this would allow some consolidation. I'll let @mike-wendt and @kkraus14 chime in though.
Also, I'm not sure what the status is of NCCL2. Is it also limited to registered developers?
NCCL2 is open source and free to redistribute. The source code is also on github now.
Do RAPIDS libraries depend on the
cudatoolkit
conda package in thedefaults
channel currently?
Right now we do not rely on cudatoolkit
as that creates an dependency on how CUDA is installed. Many of our devs are using their own versions of CUDA that they have installed to /usr/local/cuda
This also includes internal folks who are using nightly CUDA builds for testing and bleeding edge development.
Is this something that they would be willing to do, or is there some other proposed convention?
I'm not against standardizing around cudatoolkit
; however, my concern here is this complicates things for the users above. As this overwrites their system install with the conda version. In addition, as we have run into issues with RAPIDS, we need not just the toolkit but also the system level NVIDIA drivers. This takes us out of user-space where conda shines at management, and into system-level kernel modules.
Certain libraries are distributed with the GPU driver that we depend on, and for systems that do not have this installed the RAPIDS libraries fail to build. So we have more of a CUDA and NVIDIA Driver dependency. While it sounds like we're introducing more complexity, we handle this by restricting the level of CUDA we support in RAPIDS. The reason we rely on this approach is due to the fact that each version of CUDA has a minimum NVIDIA driver version that it needs to operate. If that driver version is not satisfied, then the installation fails and the user usually upgrades their driver to a compatible version.
I want to be clear though that we should not make this a true dependency; that is require a driver version that matches the cudatoolkit
to be installed. Perfect use case is Travis CI and other CPU-only build environments, that need the toolkit and libraries for compiling and linking, but do not need the driver. Establishing a true dependency will require the installation of the NVIDIA driver on those systems when they may not be needed.
cudatoolkit
as a dependency can be done, but we should try to address the following if possible
cudatoolkit
/usr/local/cuda
and setting the appropriate paths, LD_LIBRARY_PATH
, etccudatoolkit
meta.yml
to have support for defining channels in addition to package names?Allow users to leverage their own CUDA installations - my guess is this package would sym-link to /usr/local/cuda and setting the appropriate paths, LD_LIBRARY_PATH, etc
This is an interesting question. Presumably this happens in other disciplines today? If I'm working on a new BLAS implementation is there some way for me to link a conda-installed numpy to my version of BLAS (or really any version other than OpenBLAS and MKL) or am I outside of conda-supported workloads and I should be handling things on my own at this point?
I'm curious about your questions regarding mirroring @mike-wendt . What packages would you mirror and how would this solve the problem of working with bleeding edge cudatoolkit
installs? Would your mirrored package just be an empty cudatoolkit
that allows the locally installed version to come through?
This is an interesting question. Presumably this happens in other disciplines today? If I'm working on a new BLAS implementation is there some way for me to link a conda-installed numpy to my version of BLAS (or really any version other than OpenBLAS and MKL) or am I outside of conda-supported workloads and I should be handling things on my own at this point?
I think from our in-person discussion this is "outside of conda-supported workloads." The target user for RAPIDS, pytorch, and others using CUDA are just that "users." They primarily want a way to get up and running quickly instead of trying to figure out dependencies. Standardizing around cudatoolkit
across all projects would help this effort.
For the rest they are developers which comes with some work. They need to be aware of the tools and how to use them, so they can verify and test approaches for users using conda, but not totally dependent on them for development. Your BLAS example is a good one, as is RAPIDS, which needs the full CUDA development install for compilers and includes. Not to mention our unique case where we are testing nightly builds and need a process outside of conda for that. Could we move that to conda and publish nightly packages privately? Sure, but I don't believe it will bring the value I thought it would previously.
I'm in favor of using cudatoolkit
as a standard to help enable the dependency resolution and help users stay in "user space" without having to deal with system stuff besides an NVIDIA driver.
cuda##
in the build string of packages to clearly identify the dependency of the package like py##
cuda92
and cuda100
- omitting the decimalcuda9.2.148
and cuda10.0.130
- full version number9.2
& 10.0
or cuda9.2
& cuda10.0
- major/minor version numbercudatoolkit
specifies versions as major/minor M.A
so we may want to match that but use cuda
prefix to identify it separately from other conventions like py36
main
for all others use dev
dev
label allows for community use and testing of newer versions of CUDA, but to also act as a means of ensuring stability for userscudatoolkit
so we need to get these packages updated to allow the community to move forwardcudatoolkit
on the first day there is a CUDA releasecudatoolkit
packages
base
, devel
, runtime
runtime
through cudatoolkit
base
I'm not certain we would need unless devel
comes from conda-forge/cudatoolkit-dev
cudatoolkit-dev
belongs also in Anaconda or should stay in Conda-Forgecudatoolkit
cudatoolkit-dev
downloads the CUDA installer and runs it in the conda environment, it works but it could be betterOK, it seems like everyone is on board with specifying CUDA version numbers by expressing dependencies on cudatoolkit
.
It also sounds like @mike-wendt is proposing a convention around including the cuda version number in build strings. Is there any objection to this?
@mike-wendt would the next release of RAPIDS follow this convention, or is that too early for you?
@soumith , what does this process look like on the PyTorch side? Is it easy for you all to change around your builds and your installation instructions? I can imagine that you would want to have some sort of smooth transition.
would the next release of RAPIDS follow this convention, or is that too early for you?
@mrocklin The blocker for us is the lack of a cudatoolkit
for CUDA 10.0. If we can get it by next week we may be able to include this in v0.5.
The major concern I have this week and next is the Conda-Forge plan for gcc7 switchover that occurs on 1/15. So I think it is safe to say a lot of us will be busy that week dealing with the conversions and any necessary updates related to that primarily.
Right now we are scheduled to freeze for v0.5 on 1/16 so I think it will be hard to guarantee that it makes it this release, but we might be able to do a hotfix release the week after.
There is no immediate stress on this. Happy to play a long game. So the first time that RAPIDS would use this convention would be sometime in March?
@mrocklin process on PyTorch side is easy, we just have to change our build scripts. I'm inclined to change when CUDA10 cudatoolkit is available as well, because otherwise half of our install commands are via feature packages cuda100
, and other half will be around cudatoolkit=9.0 etc.
CUDA 10.0 cudatoolkit recipe is live https://github.com/numba/conda-recipe-cudatoolkit.
I had a talk with @seibert yesterday about what he thinks he needs from conda to support this. I think we agreed that conda needs "virtual packages" which @kalefranz has been lumping in with "markers" but which I think are actually separate.
A virtual package is something that represents some aspect of the system. Its version and build string can be dynamically determined by having conda run some code for that particular virtual package. It would then be considered in the solver as a package with a strict pinning.
For cuda, it means that we need to decide what this package name should be. Then all packages would express their CUDA compatibility as normal dependencies on that package.
A user's system may present something like a dependency of:
cuda=10.0=sub-build-id
while packages such as pytorch should express normal version dependencies like:
cuda >=10,<11.0a0
(adjusted as appropriate for the actual compatibility expectations of cuda)
Conda could obviously never update cuda, but it would be nice to have it recognize ways outside of its control to update (i.e. tell the user that they can update their driver or upgrade their hardware). Depending on the time it takes for this cuda virtual package to represent itself, it may be something that we cache on disk and have a "refresh"-type command.
@seibert volunteered some time towards getting this implemented in conda. We'll hope to have something ready soon - likely with the next minor release of conda, 4.7.0.
@msarahan to be clear it sounds like you're proposing this as an alternative to using cudatoolkit
to represent CUDA version dependency, correct?
could be? I'm ambivalent on that. If you can ship runtimes that work with a variety of drivers, maybe they can be independent.
Or should I say: maybe cudatoolkit stays in usage the same as it is now, but cudatoolkit itself grows a dependency to this new virtual package to establish driver requirements.
OK, so you think that it's still the right approach for downstream packages to depend on cudatoolkit
today, and that in the future conda might do some work to auto-detect the cuda version on the system so that users don't have to specify it themselves.
Yep, definitely important to bridge the gap with cudatoolkit, since new conda versions may take a while to be available. Perhaps cudatoolkit can be dropped in the more distant future when this new approach is proven and commonly available. Thankfully, I expect the CUDA-using community will be quick adopters of new conda versions, rather than laggards holding onto old versions.
@msarahan thanks!
CUDA 10.0 cudatoolkit recipe is live https://github.com/numba/conda-recipe-cudatoolkit.
Is there something we need to do to get this into defaults, or is it in the pipeline already?
Is there something we need to do to get this into defaults, or is it in the pipeline already?
I will work on getting cudatoolkit 10.0 into defaults next week.
@jjhelmus any update on cudatoolkit 10?
A cudatoolkit
10.0.130 package is available in defaults
for linux-64. I've been running into some issues with the Windows package but expect to have it available soon.
A cudatoolkit
10.0.130 package is available for win-64 now.
@jjhelmus the cudatoolkit
packages have inconsistent versioning. we have cudatoolkit=9.0 cudatoolkit=9.2, but cudatoolkit=10.0 doesn't exist, it's instead the full version string cudatoolkit=10.0.130. Could you help fix that.
The addition of the micro version was intentional. NVIDIA labels CUDA releases with a micro version and I think in the past has released multiple micro versions for a given major.minor version. With the previous cudatoolkit
packages there was not method to differentiate these changes. The addition of the micro version to cudatoolkit 10.0.130 is more specific and allows for updates if a new micro version is released. Package builder and users should still specify the version by the major.minor version, e.g. conda install cudatoolkit=10.0
, conda will automatically provide the micro version.
okay, I've worked around this. In my refactored recipe, I was providing cudatoolkit==10.0, and that was ending up depending on cudatoolkit==10.0.130
, and was refusing to install if I specified conda install pytorch cudatoolkit=10.0
. I've worked around it by specifying cudatoolkit >=10.0,<10.1
in the recipe "runtime" dependencies instead.
On my side, the conda install pytorch cuda100 -c pytorch
business should go away with the release of pytorch v1.0.1. We are moving towards: conda install pytorch cudatoolkit=10.0 -c pytorch
.
Thanks all for the thread.
Alternatively, you can use {{ pin_compatible('cudatoolkit', max_pin='x.x') }}
in the requirements/build section of the recipe to have conda build generate the run requirement from the version of cudatoolkit specified in the requirements/host section. This can be helpful if the same recipe is used to build packages for cudatoolkit versions,
Does Anaconda also handle the builds for cupy
in the defaults channel? If so could the convention layed out here be used for those packages as well?
conda/conda#8267 will add support for a cuda
(or maybe @cuda
) virtual package that autodetects the version of CUDA supported by the graphics driver.
@jjhelmus builds the cupy packages, I think. They should already depend on the cudatoolkit
package, AFAIK.
The cupy
packages on defaults
depend on the cudatoolkit
package. Their builds strings do not include the cuda version but I will add that for the next release.
To help codify this a bit more, I've put up PR ( https://github.com/conda-forge/docker-images/pull/93 ) and PR ( https://github.com/conda-forge/staged-recipes/pull/8229 ). These provide a Docker image (based off conda-forge's current Docker image) for compiling packages and a shim package to get NVCC and conda-build to talk to each other. Please share your thoughts on these.
Something else worth mentioning here. I've noticed that CMake when using the CUDA language feature often likes to statically link to the CUDA runtime library. There use to be a way to disable this (e.g. CUDA_USE_STATIC_CUDA_RUNTIME
), but it is part of the deprecated FindCUDA module and doesn't work with the newer, preferred CUDA language feature. This will result in some package bloat if all CMake-based CUDA packages are doing this static linking and we are shipping cudatoolkit
along with. There appears to be an open issue in CMake to fix this. I'm not sure how painful this is for people yet, but wanted to raise awareness if package size is an issue.
How should a package maintainer specify a dependency on a specific CUDA version like 9.2 or 10.0?
As an example, here is how PyTorch does things today:
conda install pytorch torchvision cuda80 -c pytorch
conda install pytorch torchvision -c pytorch
conda install pytorch torchvision cuda100 -c pytorch
conda install pytorch-cpu torchvision-cpu -c pytorch
I believe that NVIDIA and Anaconda handle things differently. I have zero thoughts on which way is correct, but I thought it would be useful to start such a conversation around this. My hope is that we can come to some consensus on packaging conventions that can help users avoid broken environments more easily and provide a good pattern for future package maintainers to follow.
cc @jjhelmus @msarahan @nehaljwani @stuartarchibald @seibert @sklam @soumith @kkraus14 @mike-wendt @datametrician