Closed jakirkham closed 1 year ago
Even though they named it that way it really corresponds to hardware architectures. The way it's currently laid out in NCCL is Kepler and newer architecture support. This sounds reasonable to me and we shouldn't change it.
If we could have a virtual package in conda that gives us the GPU architecture that we could use for getting a separate package that could be an option, but it would make conda environments less portable (though the same thing happens on the CPU side with AVX / SSE / etc anyway).
The usage of CUDA9_PTX
and CUDA9_GENCODE
for CUDA_MAJOR >= 11 is interesting. Might be better to have CUDA9_PTX
map to
-gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75
I am doing this (restricting gencodes) for aarch64 to avoid timeout (#60) and I'd like to revisit this discussion.
Now that we've switched to cross compiling, the build time is significantly reduced, let me make a judgment call and close this issue. We can revisit as needed.
AIUI NCCL includes a larger range of gencodes going back to CUDA 8 and even includes older CUDA versions when building for newer CUDA versions. To cutdown on binary size and speed up builds, we might consider using a more narrow set of gencodes for each CUDA version