Closed jakirkham closed 3 years ago
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe
) and found it was in an excellent condition.
cc @isuruf @kkraus14 (as this is related to our discussion last week)
cc @leofang (in case you have thoughts generally & particularly on NVRTC usage with CuPy)
cc @raydouglass (for awareness)
Also we will want a repodata patch in addition to this. Think this should help us in setting a cutoff for how many packages would need that repodata patch and keeping that list finite. Though feel free to correct me if there are other things we do/don't need to do before adding the repodata patch
I think we just need to make sure to loosen the __cuda
constraint on the cudatoolkit
package for 11.2+ as well moving forward.
I think we just need to make sure to loosen the
__cuda
constraint on thecudatoolkit
package for 11.2+ as well moving forward.
I doubt this is safe. At least for applications depending on PTX JIT (such as NVRTC), the driver version must be compatible with the runtime CTK version. If it's too old then we'd see driver errors such as CUDA_ERROR_UNSUPPORTED_PTX_VERSION
. This is best explained here:
However, the more recent NVRTC library may generate PTX with a version that is not accepted by the CUDA Driver API functions of an older CUDA driver, as explained in the CUDA Compatibility document. Some approaches to resolving this issue: ...
and here:
Applications that compile device code to PTX will not work on older drivers. If the application requires PTX then admins have to upgrade the installed driver.
In the case of CuPy, we work around this limitation by always generating SASS (CUBIN) instead of PTX for CUDA 11.1+, as instructed in the NVRTC documentation. But I don't know about other CUDA packages on conda-forge.
I doubt this is safe. At least for applications depending on PTX JIT (such as NVRTC), the driver version must be compatible with the runtime CTK version. If it's too old then we'd see driver errors such as
CUDA_ERROR_UNSUPPORTED_PTX_VERSION
.
Yes, we're aware that there's potential landmines but there was some discussion and I believe a decision was made that the best bet for now would be to set __cuda>=11.2,<12
as an export and if a project is using newer / incompatible features then they can override that export.
Once we move to individual packages as opposed to one big cudatoolkit
package then we could consider revisiting the __cuda
export constraint on the libnvrtc
package?
@kkraus14, even though we discussed this issue, the issue about PTX never came up in the discussion. In that case, I don't think it is safe to build with 11.4 and running with 11.2.
Fair enough. It's a bit of a shame about the PTX versioning.
Maybe if we can have a virtual package related to GPU architecture in the future it would open some new opportunities here.
I think only Numba uses PTX and it only uses that at runtime. Am not aware of Numba or other packages distributing PTX. So I don't think PTX is an issue for us
My take is to push this through as a way for libraries to take advantage of CUDA Enhanced Compatibility. I am not saying this as an NVIDIA employee, but rather as a GPU developer (for CuPy). This is the trajectory that we've gone through, and I expect there will be more and more users asking each library to adopt, and eventually it'd no longer be a concern for us.
Agreed. I think we are at a similar point with CUDA support as we were with Windows UCRT several years back. If we go ahead and embrace it warts and all, this will massively simplify our maintenance burden going forward and the rough edges will get smoothed in later iterations.
I think only Numba uses PTX and it only uses that at runtime. Am not aware of Numba or other packages distributing PTX. So I don't think PTX is an issue for us
Any library that is compiled with PTX in it can potentially cause issues. I.E. RAPIDS libraries have PTX for the latest supported architecture: https://github.com/rapidsai/rapids-cmake/blob/branch-21.12/rapids-cmake/cuda/set_architectures.cmake#L77-L81
This means with an older driver if someone tries to run on an older card that doesn't have the SASS already compiled that it would error in trying to JIT the PTX.
@raydouglass, do you have any thoughts on Keith's observation above?
From the cmake file that Keith quoted, it looks like for each arch a SASS is generated, and in addition a PTX is generated for the latest arch (potentially for early adoption when a future arch is out).
For this particular use case, my understanding (I think this is a public info) is if we have a new arch out (say sm_90
), the CTK major version needs to be bumped anyway, so we should be safe with regard to the Enhanced Compatibility which is not guaranteed across major versions.
I am interested to learn other PTX use cases in the wild.
Yea it's moreso that say someone tries to run this on Maxwell architecture then it would try to JIT the PTX and would error from the PTX version being too new for the driver version in this case.
So it sounds like we are on the same page. Would it make sense to return to the individual PRs at this point to make sure they have the needed changes and then working on getting them in? Or are there other things we still need to discuss here before doing that?
I think the only point of contention left is if we move forward in adding CUDA 11.3 / 11.4 docker images. I think we should move forward with building those images to enable packages that can leverage newer features to opt into using them.
Otherwise I think we have everything else in place.
Yeah am ok with doing that. Currently the Docker images are waiting for the cudatoolkit
packages to be produced ( https://github.com/conda-forge/cudatoolkit-feedstock/pull/60 ). So getting that in would help
I thought we could do the opposite: Get rid of all CUDA images and use vanilla Linux images + CUDA conda packages to build? (As I demo'd in https://github.com/conda-forge/cupy-feedstock/pull/143)
Think you are thinking about something else ( https://github.com/conda-forge/cudatoolkit-feedstock/issues/62 ). At some point yes, but we are not ready to go there yet.
Should add would like to keep the scope tight here to make sure we achieve this step in a reasonable timeline 😉
Ah OK, wasn't aware of the timeline consideration. Sounds good to me.
Have submitted PR ( https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/pull/172 ) to hot-fix the CUDA 11.2 packages to relax their pinnings.
If folks have time to review that PR and this one, that would be appreciated 😄
Any other thoughts on this one? Also the repodata patch has been merged
Thanks all! 😄
If anything else comes up, happy to follow up in another issue/PR 🙂
I think only Numba uses PTX and it only uses that at runtime. Am not aware of Numba or other packages distributing PTX. So I don't think PTX is an issue for us
Hey all, sorry I'm late to the party. I don't know all the acronyms, but I'm pretty sure that the faiss feedstock uses PTX too. At least, this looks very similar to what Keith was quoting.
Are there any actions to take here for faiss? I just recently got around to finishing https://github.com/conda-forge/faiss-split-feedstock/pull/46, and I saw that the 11.2 builds were built by CTK 11.4 (I would link the lines in the CI run output, but azure's permalinks are useless, at least on restarted jobs). It's perhaps worth noting that CMake misidentifies this as 11.2 on both linux & windows.
I just recently got around to finishing conda-forge/faiss-split-feedstock#46, and I saw that the 11.2 builds were built by CTK 11.4 (I would link the lines in the CI run output, but azure's permalinks are useless, at least on restarted jobs). It's perhaps worth noting that CMake misidentifies this as 11.2 on both linux & windows.
Yeah Azure's log links are broken these days, not sure what's going on, but it's no longer useful for searching and linking. Anyway, what you said above is technically incorrect. As you noticed there's this line:
2021-10-13T11:25:08.0185619Z -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.2.152")
This is expected --- the CUDA nvcc toolchain picked up by CMake is correct, and it's outside of CF: It's from the bundled CUDA 11.2 provided by CF's docker image. So, it means CMake uses CUDA 11.2 to build the project, and thanks to the ABI compatibility, it'd work with CUDA 11.2+.
Regarding PTX, yeah the lines you linked look similar to the use case Keith described.
Thanks for the response @leofang!
This is expected --- the CUDA nvcc toolchain picked up by CMake is correct, and it's outside of CF: It's from the bundled CUDA 11.2 provided by CF's docker image.
Is this intentional?
So, it means CMake uses CUDA 11.2 to build the project, and thanks to the ABI compatibility, it'd work with CUDA 11.2+.
This situation sounds like a fortunate accident to me. Seeing that faiss is probably affected by the ABI concerns, how would a feedstock opt out of the enhanced compatibility?
Edit: what I mean by "fortunate accident" is that if the build had picked up the conda-forge version (11.4), yet marked it as 11.2+ compatible, things would likely break.
I think you need to override the run_exports
from nvcc
so it's stricter. Right now faiss
will have cudatoolkit>=11.2,12.0a
, and you need cudatoolkit=11.2.*
.
I think you need to override the
run_exports
fromnvcc
so it's stricter. Right nowfaiss
will havecudatoolkit>=11.2,12.0a
, and you needcudatoolkit=11.2.*
.
My understanding was it's backwards- but not forwards-compatible. I.e. running 11.4 with a binary built by 11.2 is fine, but running on 11.2 with a binary built by 11.4 is not.
From that POV, the current run_export seems correct for 11.2, but I was more worried that the conda env apparently had 11.4 (which would hence only be usable by 11.4+), yet the build picked up the nvcc from the docker-image.
If that setup is intentional I can see how it works, but it gives me a queasy feeling. If one of many different knobs is changed and suddenly I pick up a newer nvcc without knowing, my artefacts are not usable on 11.2 anymore. Or am I needlessly worrying about that?
but I was more worried that the conda env apparently had 11.4 (which would hence only be usable by 11.4+), yet the build picked up the nvcc from the docker-image.
This is intentional and works the same way as other packages. For example, gcc_linux-64
is pinned to 9, but we have libgcc-ng=11
in the environment. Does that worry you too?
This is intentional and works the same way as other packages. For example,
gcc_linux-64
is pinned to 9, but we havelibgcc-ng=11
in the environment. Does that worry you too?
TBH, I don't yet "get" the GPU compiler stack very well - I was reacting to the PTX-vs-ABI discussion that didn't seem resolved. I'm happy to hear that it's intentional.
I was reacting to the PTX-vs-ABI discussion that didn't seem resolved.
That was about forward compatibility (compiling with 11.4 and running/linking with 11.2). There are no backward compatibility issues (compiling with 11.2 and running/linking with 11.4).
There are no backward compatibility issues (compiling with 11.2 and running/linking with 11.4).
That much I got 😅
As long we're guarding against accidentally picking up 11.4 ABI (i.e. I was reacting to CTK 11.4 appearing in the build), then it's all peachy.
As long we're guarding against accidentally picking up 11.4 ABI (i.e. I was reacting to CTK 11.4 appearing in the build), then it's all peachy.
CTK 11.4 appearing in the build is only used for linking and as I said in my previous comment, linking with 11.4 is fine.
Leverages CUDA Enhanced Compatibility for 11.2+. This should simplify updating to new CUDA versions going forward and cutdown on the maintenance burden of CUDA versions.
Checklist
0
(if the version changed)conda-smithy
(Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)