Closed jakirkham closed 1 year ago
cc @adibbley @bdice @robertmaynard (since we were discussing this last week)
cc @kkraus14 (in case you have thoughts here)
Also please feel free to include others
cc @raydouglass @ajschmidt8
A related question is whether we want an option to pull in dependencies that align with a particular CUDA Toolkit release. IOW cuda-version=12.0
would ensure any CUDA libraries installed would be aligned with with CUDA Toolkit 12.0
- What CUDA compiler version a package was built with?
I hope this is not of concern to cuda-version
. Doesn't conda-build have a way to keep track of the used version of {{ compiler("XXX") }}
(regardless if XXX
is c
, cxx
, cuda
, or @bdice's cuda11
/cuda12
)? I hope to keep the status quo. We need to use this information to compute the package hash.
- Which CUDA versions a package would support (what CUDA driver version requirements might exist)?
+1 on this:
__cuda
cuda-version
- How a particular CUDA library is tied to a specific CUDA version?
I would think this is equivalent to the above question.
Additionally there is a question about how CUDA version tracking support ties into CUDA compatibility
ditto, I would think this is equivalent to the above question.
A related question is whether we want an option to pull in dependencies that align with a particular CUDA Toolkit release. IOW
cuda-version=12.0
would ensure any CUDA libraries installed would be aligned with with CUDA Toolkit 12.0
I would hope this continues to be the job of a meta package (cudatoolkit
or cuda-toolkit
), as it is today.
Yup, I think I align with @leofang on pretty much everything above.
At the high level, I was expecting that every CUDA package would depend on an exact pinning like cuda-version==12.0
.
IOW
cuda-version=12.0
would ensure any CUDA libraries installed would be aligned with with CUDA Toolkit 12.0
Yes, users can constrain their installed CUDA packages (e.g. math libraries) with conda install [...CUDA packages...] cuda-version==12.0
, or downstream packages can require cuda-version>=12.1
for CUDA runtime requirements. As @leofang said above, the driver side of things would be handled by __cuda
metapackage requirements.
cuda-version
brings some of the same benefits as the versioning of the metapackage like cuda-toolkit=12.0
, but without requiring all the toolkit packages to be installed. It enforces consistency -- the CUDA Toolkit is disaggregated into independent packages now, but I am under the impression that the packages should come from the same CUDA Toolkit release and not be mixed-and-matched. Is that accurate? I assume you shouldn't be able to install 12.0 compilers, 12.1 math libraries, and 12.2 nvJitLink -- and that there would be no real use case for such a situation. Making all the CUDA packages depend on a particular cuda-version
constrains all CUDA packages in a given conda environment to be released as part of the same CUDA Toolkit version, which sounds like a good thing.
cuda-version
brings some of the same benefits as the versioning of the metapackage likecuda-toolkit=12.0
, but without requiring all the toolkit packages to be installed.
Actually, I think this can be nicely addressed:
cuda-version ==X.Y
to cuda-toolkit
's run
requirementcuda-toolkit
's run_constrained
requirementThe (anticipated) effect is:
cuda-toolkit
, which is empty, there's no disk/network usagecuda-version
and cuda-toolkit
is enforced by the conda solverThe only missing piece I haven't figured out is a command or something for users to express "I want to actually install everything that cuda-toolkit
would offer in the old days," without listing them explicitly like conda install A B C D E ... Z
😅 But perhaps it's not that important?
but I am under the impression that the packages should come from the same CUDA Toolkit release and not be mixed-and-matched. Is that accurate?
I would agree. I've asked a math lib team before, and they don't do any mix-and-match tests so far, so there's always a risk that this doesn't work out of box. So,
Making all the CUDA packages depend on a particular
cuda-version
constrains all CUDA packages in a given conda environment to be released as part of the same CUDA Toolkit version, which sounds like a good thing.
Yes, I'd argue that let's start from a tighter constraint first, make sure it works for the whole community, and once the GPU CI is up, we conduct studies and decide if we can relax some of the above assumptions. Going from tight to loose is always safer than the opposite direction 🙂
I should add that I made an implicit assumption above that any CUDA component released at CTK X.Y will have
run:
- cuda-version >=X.Y,<X+1
i.e. I set its lower bound to the CTK version that it comes with (which is a tighter requirement than what minor version compatibility would guarantee).
Additionally there is a question about how CUDA version tracking support ties into CUDA compatibility
ditto, I would think this is equivalent to the above question.
Sorry, I spoke too fast, we need cuda-version
of version X.Y to depend on __cuda >=X.0
. That a CTK requires a minimal driver version has been always the case, what's different here is that we relax to X.0
to declare CTK ver X
has minor version compatibility (=new CTK, old driver, both within the same major release), as opposed to setting __cuda >=X.Y
.
So would we want to change this?
My 2c, ultimately when it comes to CUDA packages users will want a few different behaviors:
12.0
or 12.1
.
cuda-toolkit
or cuda
package?12.0
or 12.1
.
cuda-version
package along with some type of run_constrained
?Thrust
2.0.1
libcusolver
depends on libcublas
where if they both require the same minor / patch version, there should be appropriate pinning to enforce such.I'm okay with the proposal of making all of the packages tied to a given toolkit version as long as Thrust
, CUB
, and libcu++
are excluded. It would be nice if we still gave a path for users to request "Thrust version shipped with CUDA Toolkit 12.0" though.
So would we want to change this?
I guess not? This could allow CPU users to install it. (So, I wasn't being accurate enough to just say "depend on", I should have said "setting run_constrained
to be", which is what we do now. 🙂)
Ah was more meaning whether this should become (though maybe this doesn't matter much)
requirements:
run_constrained:
- - __cuda >={{ major_version }}
+ - __cuda >={{ major_version }}.0
Though yeah run_constrained
for CPU users (also us for building) and older Conda's are considerations for run_constrained
@kkraus14
- The ability to get a specific shipped full toolkit at a given version, i.e.
12.0
or12.1
.
Yes, as mentioned above I haven't figured this out.
- The ability to get specific packages at a given toolkit version, i.e.
12.0
or12.1
.
I hope this is addressed by my proposal above.
- The ability to get specific packages not tied to a given toolkit version, i.e. I may want to get
Thrust
2.0.1
I am not certain about this. Perhaps, if we don't list cuda-version
to any of its requirement, this would be possible? Or if we adopt a relaxed scheme for it (see below)?
I'm okay with the proposal of making all of the packages tied to a given toolkit version as long as
Thrust
,CUB
, andlibcu++
are excluded.
Yeah it sounds like you want them to not have cuda-version
listed in run
/run_constrained
? How about a more relaxed scheme for these CCCL components, such as
- run_constrained:
- cuda-version >=X.0
for the CCCL components released in any CTK ver X
or X+1
? IIUC this aligns with CCCL's (presumed) goal.
Ah was more meaning whether this should become (though maybe this doesn't matter much)
@jakirkham I didn't know the patch has any semantic difference, if it has then this is the right patch I'd think.
Yeah it sounds like you want them to not have
cuda-version
listed inrun
/run_constrained
? How about a more relaxed scheme for these CCCL components, such as- run_constrained: - cuda-version >=X.0
for the CCCL components released in any CTK ver
X
orX+1
? IIUC this aligns with CCCL's (presumed) goal.
The only challenge for this is how do I say "I want to install Thrust that shipped with CUDA Toolkit 12.0
. The cuda-version
constraint would still be specified with newer versions.
cuda-version
brings some of the same benefits as the versioning of the metapackage likecuda-toolkit=12.0
, but without requiring all the toolkit packages to be installed.Actually, I think this can be nicely addressed:
- Add
cuda-version ==X.Y
tocuda-toolkit
'srun
requirement- Add all runtime libraries to
cuda-toolkit
'srun_constrained
requirementThe (anticipated) effect is:
- By just installing
cuda-toolkit
, which is empty, there's no disk/network usage[...] The only missing piece I haven't figured out is a command or something for users to express "I want to actually install everything that
cuda-toolkit
would offer in the old days," without listing them explicitly likeconda install A B C D E ... Z
😅 But perhaps it's not that important?
We should keep cuda-toolkit
as a "full" metapackage, not a "network install on-demand." Users should be able to install cuda-toolkit
and build software from source that leverages the full CUDA toolkit. If we make cuda-toolkit
only a constraint package, then users only get the real CUDA compilers/libraries/etc. by installing them manually or installing other packages that depend on them. That's out of alignment with what it means to "install the CUDA toolkit" with any other package manager or from the CUDA Downloads page.
I should add that I made an implicit assumption above that any CUDA component released at CTK X.Y will have
run: - cuda-version >=X.Y,<X+1
i.e. I set its lower bound to the CTK version that it comes with (which is a tighter requirement than what minor version compatibility would guarantee).
I think we want the CUDA component packages released with CTK X.Y to pin to cuda-version==X.Y
exactly (not a range). Otherwise the cuda-version
package isn't able to constrain correctly. If a user pins to cuda-version==12.1
, packages for cuda-cudart
from 12.0 should not satisfy that constraint.
Users downstream would be able to specify a dependency on cuda-version>=X.Y
if they require that runtime version.
Sorry, I spoke too fast, we need
cuda-version
of version X.Y to depend on__cuda >=X.0
. [...]That a CTK requires a minimal driver version has been always the case, what's different here is that we relax toX.0
to declare CTK verX
has minor version compatibility (=new CTK, old driver, both within the same major release), as opposed to setting__cuda >=X.Y
.
We actually need no dependency on __cuda
in the CUDA toolkit packages, from my understanding. The CUDA toolkit should be installable without having a CUDA driver, right? In other words, we should be able to install cuda-nvcc
and related packages without a GPU present.
edit: More specifically, I saw @jakirkham point to run_constrained: __cuda>={{ major_version }}
in cuda-version
. That is problematic if you have a system with an older GPU (doesn't support CUDA 12) but want to build CUDA 12 packages. CPU-only machines would satisfy this because they have no __cuda
and thus no constraint -- but ideally CPU-only and old-GPU machines would both be usable for building (not running) with CUDA 12.
__cuda
definition: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html
In the past, I've recommended using __cuda
to help constrain whether CPU/GPU packages are fetched. https://github.com/openmm/openmm/issues/3059#issuecomment-797023166
__cuda
also has important implications for CI systems building without GPUs: https://github.com/conda-forge/conda-forge-ci-setup-feedstock/pull/144/
We should keep
cuda-toolkit
as a "full" metapackage, not a "network install on-demand."
Sure, this would save me from having to answer this question:
The only missing piece I haven't figured out is a command or something for users to express "I want to actually install everything that
cuda-toolkit
would offer in the old days," without listing them explicitly likeconda install A B C D E ... Z
😅 But perhaps it's not that important?
with the trade-off that all components are installed. As long as we agree this is the right design and document it. I am no problem.
I think we want the CUDA component packages released with CTK X.Y to pin to
cuda-version==X.Y
exactly (not a range). Otherwise thecuda-version
package isn't able to constrain correctly. If a user pins tocuda-version==12.1
, packages forcuda-cudart
from 12.0 should not satisfy that constraint.Users downstream would be able to specify a dependency on
cuda-version>=X.Y
if they require that runtime version.
@bdice, I don't understand now 😅 Say cuBLAS from CTK 12.1 tightly pins cuda-version==12.1
. How am I going to allow downstream users or package maintainers to take advantage of minor version compatibility and declare any cuBLAS from CTK 12 is fine? I thought you agreed that
- if a package needs a minimal (or a range of) runtime version, pin at
cuda-version
How am I going to allow downstream users or package maintainers to take advantage of minor version compatibility and declare any cuBLAS from CTK 12 is fine?
I think you could pin freely on cuBLAS and let cuda-version
handle the constraints?
run:
- cuda-cublas
- cuda-version>=12.0,<13
I thought you agreed that
- if a package needs a minimal (or a range of) runtime version, pin at
cuda-version
I was thinking you meant this for downstream packages -- CUDA Toolkit packages should be exactly pinned to cuda-version==X.Y
, and downstream consumers of the CUDA Toolkit can pin library names (e.g. cuda-cublas
) and allowable ranges of cuda-version
(e.g. cuda-version>=12.0,<13
).
Ah OK thanks for clarifying, Bradley. So you're saying if we make a super tight pinning for CUDA components, cuda-version
becomes the version selector that used to be served by the old cudatoolkit
? So, instead of
conda install cupy rmm cudatoolkit=12.3
we would do
conda install cupy rmm cuda-version=12.3
If so I am on board. Speaking from the component author viewpoint, this would allow me to express "this version of cuSOLVER needs exactly this version of cuBLAS" easily.
edit: More specifically, I saw @jakirkham point to
run_constrained: __cuda>={{ major_version }}
incuda-version
. That is problematic if you have a system with an older GPU (doesn't support CUDA 12) but want to build CUDA 12 packages.
I am not sure this is OK.
libcuda
from CTK 12 if driver API is used in a package? If we have an old GPU and we're building a package for new CTK, I'd think we need a compatible driver. Or, do we plan to shift this responsibility to cuda-compat
alone and not worry about this?I wanted to reiterate the behaviors @kkraus14 raised above, with context from this continued discussion so far. I think these outline a good set of core behaviors and establish expectations that we can (and should) meet. There may be needs above and beyond these, but I think it's unlikely that other needs would conflict with the goals Keith laid out. It sounds like we're starting to narrow in on our expectations, so I hope this can clarify/corroborate previous statements (it shouldn't conflict with the consensus we've built above, I hope).
My 2c, ultimately when it comes to CUDA packages users will want a few different behaviors:
- The ability to get a specific shipped full toolkit at a given version, i.e.
12.0
or12.1
.
- This would presumably be handled by either the
cuda-toolkit
orcuda
package?
The full toolkit should ship as cuda-toolkit
with X.Y
versioning (and potentially patches? Let's handle that separately).
- The ability to get specific packages at a given toolkit version, i.e.
12.0
or12.1
.
- This would presumably be handled by a combination of this
cuda-version
package along with some type ofrun_constrained
?- The ability to get specific packages not tied to a given toolkit version, i.e. I may want to get
Thrust
2.0.1
- I'm unclear if we want to allow this for only specific packages that have historically and/or explicitly supported building and running on different CUDA versions like Thrust, CUB, and libcu++ or more generally for all of the cuda packages.
- Not to allow solving to an incompatible environment, i.e.
libcusolver
depends onlibcublas
where if they both require the same minor / patch version, there should be appropriate pinning to enforce such.
Users should be able to install like cuda-version==12.1 cuda-cudart cuda-cublas
and get 12.1 packages for cudart and cuBLAS. All of the above goals should be met by making the CUDA component packages depend on cuda-version==X.Y
. That pinning should be applied to:
cuda-profiler-api
Exceptions to the cuda-version
pinning might include libraries that float more freely or may have broader compatibility like:
cuda-cccl
cuda-python
- Don't we still need
libcuda
from CTK 12 if driver API is used in a package?
I forgot we ship the libcuda
stub somewhere, so this is moot. Only the 2nd question still persists.
- We really need to ensure when users install GPU packages, they have a compatible driver. By not putting this constraint somewhere, we lose this ability.
In my understanding, the status quo is that downstream packages must declare their dependence on the driver (via the __cuda
virtual package) if they want a constraint for where the package can be installed/run. Examples:
edit: see below, this is inaccurate
~Driver version compatibility doesn't seem to be handled by cudatoolkit
in CUDA 11 -- and I think that's the status quo we want in CUDA 12. Downstream packages are responsible for knowing and declaring their (potentially distinct) runtime (cuda-toolkit
and component packages) and driver (__cuda
virtual package) requirements.~
This doesn't seem to be handled by
cudatoolkit
in CUDA 11 -- and I think that's the status quo we want in CUDA 12
No, it's not true: https://github.com/conda-forge/cudatoolkit-feedstock/blob/531e4594992258568fe187bc5c4e40d8c9c57b27/recipe/meta.yaml#L576-L582 and it's dangerous to omit. I'd rather set up the constraint correctly for users (following the same spirit as the above discussion), and figure out if there's a way to relax for certain package maintainers (or if it's even needed, given this is the status quo).
Hmm. 🤔 You're right. (Unfortunately that doesn't handle the "old GPU" case -- if you have any GPU installed, its driver must be compatible with the toolkit you install, even if it's only for building and not running...). I can change my position in light of that, and support keeping the run_constrained
on __cuda
somewhere (probably cuda-version
). Downstream packages may still want to specify __cuda
dependencies in certain cases (CPU/GPU split packages or particular driver requirements for example).
The ability to get specific packages not tied to a given toolkit version, i.e. I may want to get
Thrust
2.0.1
I think this is the last question that is unresolved by the above discussion, before we can write up a summary (yay!) In Bradley's approach (tightly pinning cuda-version
) this would simply be not possible, and we must adopt a relaxed scheme / exception that both Bradley and I touched about (here and here).
@kkraus14 any comment on this?
- tools like nsight
btw I think Nsight (System/Compute) can be relaxed too. They actually release it outside of CTK, and I've tested the same Nsight version works for both CTK 11/12 just fine. I think the CTK bundle is just for convenience, but we can certainly ask around to confirm.
If someone has an old GPU driver, say their GPU is older and doesn't support newer drivers, I think the default behavior we should give is to constrain them to things that run on their system.
If they want to temporarily bypass that, they can do so using the CONDA_OVERRIDE_CUDA
environment variable as documented here: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html
If this isn't clear enough to this corner case of users, then I think we should treat this as a documentation problem as opposed to a solver constraint problem.
UPDATE: The latest summary of this cuda-version
package can be found here:
https://github.com/conda-forge/cuda-version-feedstock/blob/main/recipe/README.md
I took a stab at summarizing the above discussion. Let me know if I miss (or am wrong about) anything, and I'll edit in place. Thanks!
cuda-version
Questions/demands listed below are addressed:
cuda-version
What CUDA driver version requirements might exist?
Driver requirement should be set up using the __cuda
virtual package. For package maintainers/users who wish to overwrite (at their own risk), use the environment variable CONDA_OVERRIDE_CUDA
.
How CUDA version tracking support ties into CUDA compatibility
cuda-version >=X.0,<X+1
to allow minor version compatibility (see below).The ability to get specific packages not tied to a given toolkit version, i.e. I may want to get Thrust 2.0.1
Unclear if possible under the "tight constraint" scheme that we plan to adopt.
cuda-version
):
__cuda
for a given CTK major version X (see https://github.com/conda-forge/cuda-version-feedstock/issues/1#issuecomment-1462752780).cuda-version
, with the lower bound being the CTK version X.Y that it comes withcuda-version
corresponding to the CTK version X.Y that they come with. Exceptions should be discussed on a case-by-case basis.lib
, lib-dev
and lib-static
)cuda-version
(plus __cuda
, if applicable)cuda-version
as the CTK version selector to set up a consistent environmentExample: nvcc from the CTK version X.Y
- run_exports:
- cuda-version >=X.Y,<X+1
Example: cuBLAS from the CTK version X.Y
- host:
- cuda-version X.Y
- run:
- {{ pin_compatible("cuda-version", max_pin="x.x") }}
Example: cuSOLVER depends on cuBLAS from the same CTK version X.Y
- host:
- cuda-version X.Y
- libcublas
- run:
- {{ pin_compatible("cuda-version", max_pin="x.x") }}
- {{ pin_compatible("libcublas", max_pin="x.x") }}
Example: A package depends on cuBLAS and supports CUDA minor version compatibility
- run:
- cuda-version >=X.0,<X+1
- libcublas
Example: Set up an environment compatible with CTK 12.1
conda install -c conda-forge cupy rmm "cuda-version=12.1"
Example: Set up an environment compatible with CTK 12 (of any minor version)
conda install -c conda-forge cupy rmm "cuda-version=12"
Example: Just set up a legit CUDA environment
conda install -c conda-forge cupy rmm
Thanks all for sharing your thoughts here! 🙏
Generally this seems reasonable
Will focus on one point that is unaddressed
What CUDA compiler version a package was built with?
This should be queried from
{{ compiler("cuda") }}
or equivalent.
So at build time of a package, the compiler version would be set. Currently this is done as a global pin (with the option to override for a particular package if needed). Likely we would continue this for CUDA 12.
Once the package was built (using the current model), cudatoolkit
would be added as a run dependency. However this would need to change in the future as cudatoolkit
would not be added as a dependency of a package.
This matters as a package may be able to use a new CUDA feature (for example the CUDA stream-ordered allocator added in 11.2) only if it was built with a new enough compiler version. However to get that package built with that compiler version, somehow a user must specify that dependency.
So how do we encode this information for users of packages? Essentially this would be adding some kind of runtime dependency on packages built with cuda-nvcc
, but what should it be. Here are some options:
cuda-version
(same as used by CTK packages)cuda-nvcc-version
or similarcuda-cudart
(extra as it is statically linked by cuda-nvcc
currently, but maybe that could change)__cuda
(users could affect this with CONDA_OVERRIDE_CUDA
as noted above)Thoughts on any of these (or others)?
If someone has an old GPU driver, say their GPU is older and doesn't support newer drivers, I think the default behavior we should give is to constrain them to things that run on their system.
If they want to temporarily bypass that, they can do so using the
CONDA_OVERRIDE_CUDA
environment variable as documented here: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.htmlIf this isn't clear enough to this corner case of users, then I think we should treat this as a documentation problem as opposed to a solver constraint problem.
This won't break CI machines which have either no GPU driver or an old one ( and link to stubs/libcuda.so )?
Thanks, @jakirkham, this is a very good (and subtle) point.
So how do we encode this information for users of packages? Essentially this would be adding some kind of runtime dependency on packages built with
cuda-nvcc
, but what should it be. Here are some options:
cuda-version
(same as used by CTK packages)
I would think the new compiler package should continue doing run_exports
but use this, so for nvcc from CTK ver X.Y we do
- run_exports:
- cuda-version >=X.Y,<X+1
Seems to me it'd match the current behavior that you linked to.
This won't break CI machines which have either no GPU driver or an old one ( and link to stubs/libcuda.so )?
@robertmaynard as discussed above, we'd only list __cuda
under run_constrained
, meaning
__cuda
isn't set, so there's simply no conflict/problem__cuda
is set to a lower value, so we need CONDA_OVERRIDE_CUDA
to overwrite it and let the (new) libcuda
stub kick in.Thanks all! 🙏
So it sounds like we have reached consensus. Though please let me know if I've missed something
Next steps would be updating the CUDA Toolkit libraries (notably cublas
) to follow this approach. Anything else we should do?
What was the conclusion / where did we land for packages like Thrust
, cub
, and libcu++
that have versions not tied to a specific CUDA Toolkit release?
@kkraus14 Maybe you've missed it, I left a question for you:
The ability to get specific packages not tied to a given toolkit version, i.e. I may want to get
Thrust
2.0.1
I think this is the last question that is unresolved by the above discussion, before we can write up a summary (yay!) In Bradley's approach (tightly pinning
cuda-version
) this would simply be not possible, and we must adopt a relaxed scheme / exception that both Bradley and I touched about (here and here).@kkraus14 any comment on this?
Would it be OK to you that we relax the pinning to cuda-version
a bit (in anyway -- but which way?) for CCCL?
Anything else we should do?
@jakirkham Could you elaborate a bit how would {{ compiler("cuda") }}
be set up for CTK 12? I understand there's nvcc-feedstock, and somehow {{ compiler("cuda") }}
magically maps to the nvcc
wrapper there, but it's unclear to me how this mapping works and how we'd map it to cuda-nvcc
(but only for CTK >=12, and preserves the status quo for CTK <12).
Would it be OK to you that we relax the pinning to
cuda-version
a bit (in anyway -- but which way?) for CCCL?
I think ideally from my perspective we could have a cuda-cccl
package that is a metapackage around Thrust
, cub
, and libcu++
and the cuda-cccl
package uses cuda-version
similar to other packages like cublas
. This way if someone wants specifically what's in CUDA 12.0 or 12.1 they can use the cuda-cccl
package. The tradeoff being they can only get all three packages as opposed to just one of them, but that's a very reasonable tradeoff from my perspective, especially considering they're header only (at least as of now).
Then for the individual packages like Thrust
, cub
, and libcudacxx
they can have a more relaxed pinning that allows them to float forward.
The discussion about whether cuda-cccl
can be a metapackage around Thrust
, cub
, and libcudacxx
packages is happening here: https://github.com/conda-forge/staged-recipes/pull/21953
I guess you want to keep the freedom of using thrust
/cub
in a mix-n-match fashion? If so, any chance cuda-cccl
cannot be a real package that depends on cuda-version
and also has this setup?
- run_constrained:
- thrust <0.0a0
- cub <0.0a0
- libcudacxx <0.0a0 # assuming this is on conda-forge
This would make sure that the new cuda-cccl
is mutually exclusive to thrust
/cub
; that is, they cannot coexist in the same environment.
I guess you want to keep the freedom of using
thrust
/cub
in a mix-n-match fashion?
Yes. There's no reason that we shouldn't be able to package things like thrust
, cub
, and others that have release schedules disjointed from the cuda toolkit.
If so, any chance
cuda-cccl
cannot be a real package that depends oncuda-version
and also has this setup?- run_constrained: - thrust <0.0a0 - cub <0.0a0 - libcudacxx <0.0a0 # assuming this is on conda-forge
This would make sure that the new
cuda-cccl
is mutually exclusive tothrust
/cub
; that is, they cannot coexist in the same environment.
The problem with this is someone might have an environment with two libraries, say LibraryA and LibraryB, both of which depend on Thrust
in run
because they expose Thrust
in their public interfaces. LibraryA builds with the Thrust
package, LibraryB builds with the cuda-cccl
package. We end up in an unsolvable situation.
Anything else we should do?
@jakirkham Could you elaborate a bit how would
{{ compiler("cuda") }}
be set up for CTK 12? I understand there's nvcc-feedstock, and somehow{{ compiler("cuda") }}
magically maps to thenvcc
wrapper there, but it's unclear to me how this mapping works and how we'd map it tocuda-nvcc
(but only for CTK >=12, and preserves the status quo for CTK <12).
Sure currently we specify the cuda_compiler
in conda-forge-pinning
. Since we would want different ones for different CUDA versions, we would add it to zip_keys
alongside cuda_compiler_version
and expand cuda_compiler
to multiple values (with the older compiler repeated for old CUDA versions). This shouldn't be too hard and could be handled as part of a CUDA 12 migrator.
If we add these packages also for CUDA 11 versions (this would require some careful work), we could potentially simplify this structure in the future.
Also would propose we solve the Thrust/CUB/libcudacxx/CCCL story separately. There's a few things going on there that need to be addressed. Should add see this as the next thing to solve after wrapping up this conversation about cuda-version
(hence the interest in wrapping this one up if we have reached consensus ;)
(hence the interest in wrapping this one up if we have reached consensus ;)
I updated my summary at https://github.com/conda-forge/cuda-version-feedstock/issues/1#issuecomment-1462998398. I agree if there's no more question (other than CCCL) we close this issue and start working on the libraries.
One minor clarification to @leofang's summary:
Set up the driver requirement using
__cuda
for a given CTK version X.Y
The driver requirement should be __cuda >={{ major_version }}
, like >=12
(without the .Y
), to match the cudatoolkit status quo and utilize CUDA major version compatibility.
As for cuda-cccl
/ Thrust / CUB / libcu++ discussions, I agree that we should discuss that in a separate thread from the overall versioning/dependency strategy as an exceptional case. A metapackage may not be the right long term solution because those repos will soon be combined into a monorepo and released on a unified cycle.
The driver requirement should be
__cuda >={{ major_version }}
, like>=12
(without the.Y
), to match the cudatoolkit status quo and utilize CUDA major version compatibility.
Yes, this was what's implied, see John K's https://github.com/conda-forge/cuda-version-feedstock/issues/1#issuecomment-1462752780 above (btw I still think >=X
is just the same as >=X.0
), but let me add to the summary for clarity.
btw I still think
>=X
is just the same as>=X.0
Yeah that sounds right
Clarification: should all parts of a package (e.g. libcublas
, libcublas-dev
, libcublas-static
) depend on cuda-version
, or just the main package (e.g. libcublas
)?
If we make all parts depend on cuda-version
(which is what I expect), then does it make sense to have pinnings like libcublas-dev
having a run: libcublas >={{ version }}
dependency, or should that be replaced by a tightly-constrained pin_subpackage
? Is it even possible to install newer library versions and older dev packages if both depend on a specific pinning of cuda-version
?
I asked this for cuda-cudart too, which has additional platform-specific packages like cuda-cudart-dev_{{ target_platform }}
-- do the same rules apply there?
The only concern I could think of is whether it'd break the story that we build with higher CTK version and allow it to run with lower CTK version (both within the same major release), as discussed in https://github.com/conda-forge/cuda-version-feedstock/issues/1#issuecomment-1462833464.
But if the -dev
/-static
packages don't export its cuda-version
constraint, it should be fine? (Even if it's exported, we can always undo it with ignore_run_export{_from}
, but it's more work, more error prone, and less intuitive.)
Another thought is I feel the -dev
/-static
packages, which are only used at compile time, should follow the compiler constraint that we set, which currently is
nvcc compiler: Pin a range of
cuda-version
, with the lower bound being the CTK version X.Y that it comes with
With CUDA 11 support in conda-forge, the
cudatoolkit
package provided both the CUDA libraries and a way to express what CUDA version a particular package was built with.As part of adding CUDA 12, packages are restructured to split out each CUDA library. This means each CUDA library will be a separate dependency. However this restructuring results in some information not being captured:
Additionally there is a question about how CUDA version tracking support ties into CUDA compatibility
Raising this issue so we can discuss and decide how we want to leverage this package