Closed minrk closed 1 year ago
I don't really use cuda much, so hopefully @mariusvniekerk can chime in. It seems reasonable to me, although a fairly significant breaking change.
It anyways looks like we're using CUDA 11.4 virtual package in our fake repodata, even though the latest is 12.1. Perhaps there are multiple todos here on the virtual package front?
I think probably the biggest counter argument is that most cuda things explicitly depend on cuda, and some of those environments will no longer be solvable without specifying a virtual package spec anymore, and it's uncommon for things that don't depend on cuda that have a cuda variant. I get the impression that this is becoming less true, though. I just happen to be using one of those packages (torch-cpu -> mkl -> tbb -> hwloc -> cudatoolkit).
Would it make sense to introduce --with-cuda
and --without-cuda
flags? And then if something CUDA is installed and these flags aren't specified, then we emit a warning asking the user to be explicit?
That would mean keeping cuda in the default virtual packages (otherwise it would fail to solve and we wouldn't get to a warning), then checking for e.g. cudatoolkit
in the result and warning if cuda was left unspecified (no virtual packages, no --with[out]-cuda)? That seems like reasonable behavior. A bit more complex to implement, but not too bad.
This seems to be quite doable to me, and it sounds like this would have prevented the need for you to debug the image size.
I'd be happy to accept a PR. I'm a bit time-constrained, but I might be able to get to this several months from now.
I'll have a look if I get a chance.
Checklist
What happened?
I noticed while installing pytorch-cpu that it pulled in the cuda variant of libhwloc and thereby cudatoolkit, doubling the size of my image. I tracked it down to the default virtual package spec assuming all machines are likely to have cuda by default.
I was able to solve it with a custom virtual packages spec (capturing the virtual packages from
conda info
in the base image), but it seems to make more sense to me for cuda to be opt-in instead of opt-out, since it's unvailable more often than not, and the cost of an incorrect assumption of its presence is high (massive size increase, non-working packages), while the cost of missing it is low (reduced performance or informative, immediate error if cuda is actually required to install). Or is there a consideration I'm missing?Conda Info
Conda Config
Conda list
Additional Context
A simple env that produces the issue is:
which includes libhwloc cuda variant and cudatoolkit by default on linux and Windows.