Use CUDA Enhanced Compatibility for 11.2+

jakirkham commented 3 years ago

Leverages CUDA Enhanced Compatibility for 11.2+. This should simplify updating to new CUDA versions going forward and cutdown on the maintenance burden of CUDA versions.

Checklist

[x] Used a personal fork of the feedstock to propose changes
[ ] Bumped the build number (if the version is unchanged)
[x] Reset the build number to 0 (if the version changed)
[x] Re-rendered with the latest conda-smithy (Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
[x] Ensured the license file is being packaged.

conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

jakirkham commented 3 years ago

cc @isuruf @kkraus14 (as this is related to our discussion last week)

cc @leofang (in case you have thoughts generally & particularly on NVRTC usage with CuPy)

cc @raydouglass (for awareness)

jakirkham commented 3 years ago

Also we will want a repodata patch in addition to this. Think this should help us in setting a cutoff for how many packages would need that repodata patch and keeping that list finite. Though feel free to correct me if there are other things we do/don't need to do before adding the repodata patch

kkraus14 commented 3 years ago

I think we just need to make sure to loosen the __cuda constraint on the cudatoolkit package for 11.2+ as well moving forward.

leofang commented 3 years ago

I think we just need to make sure to loosen the __cuda constraint on the cudatoolkit package for 11.2+ as well moving forward.

I doubt this is safe. At least for applications depending on PTX JIT (such as NVRTC), the driver version must be compatible with the runtime CTK version. If it's too old then we'd see driver errors such as CUDA_ERROR_UNSUPPORTED_PTX_VERSION. This is best explained here:

However, the more recent NVRTC library may generate PTX with a version that is not accepted by the CUDA Driver API functions of an older CUDA driver, as explained in the CUDA Compatibility document. Some approaches to resolving this issue: ...

and here:

Applications that compile device code to PTX will not work on older drivers. If the application requires PTX then admins have to upgrade the installed driver.

In the case of CuPy, we work around this limitation by always generating SASS (CUBIN) instead of PTX for CUDA 11.1+, as instructed in the NVRTC documentation. But I don't know about other CUDA packages on conda-forge.

kkraus14 commented 3 years ago

I doubt this is safe. At least for applications depending on PTX JIT (such as NVRTC), the driver version must be compatible with the runtime CTK version. If it's too old then we'd see driver errors such as CUDA_ERROR_UNSUPPORTED_PTX_VERSION.

Yes, we're aware that there's potential landmines but there was some discussion and I believe a decision was made that the best bet for now would be to set __cuda>=11.2,<12 as an export and if a project is using newer / incompatible features then they can override that export.

Once we move to individual packages as opposed to one big cudatoolkit package then we could consider revisiting the __cuda export constraint on the libnvrtc package?

isuruf commented 3 years ago

@kkraus14, even though we discussed this issue, the issue about PTX never came up in the discussion. In that case, I don't think it is safe to build with 11.4 and running with 11.2.

kkraus14 commented 3 years ago

Fair enough. It's a bit of a shame about the PTX versioning.

Maybe if we can have a virtual package related to GPU architecture in the future it would open some new opportunities here.

jakirkham commented 3 years ago

I think only Numba uses PTX and it only uses that at runtime. Am not aware of Numba or other packages distributing PTX. So I don't think PTX is an issue for us

leofang commented 3 years ago

My take is to push this through as a way for libraries to take advantage of CUDA Enhanced Compatibility. I am not saying this as an NVIDIA employee, but rather as a GPU developer (for CuPy). This is the trajectory that we've gone through, and I expect there will be more and more users asking each library to adopt, and eventually it'd no longer be a concern for us.

jakirkham commented 3 years ago

Agreed. I think we are at a similar point with CUDA support as we were with Windows UCRT several years back. If we go ahead and embrace it warts and all, this will massively simplify our maintenance burden going forward and the rough edges will get smoothed in later iterations.

kkraus14 commented 3 years ago

I think only Numba uses PTX and it only uses that at runtime. Am not aware of Numba or other packages distributing PTX. So I don't think PTX is an issue for us

Any library that is compiled with PTX in it can potentially cause issues. I.E. RAPIDS libraries have PTX for the latest supported architecture: https://github.com/rapidsai/rapids-cmake/blob/branch-21.12/rapids-cmake/cuda/set_architectures.cmake#L77-L81

This means with an older driver if someone tries to run on an older card that doesn't have the SASS already compiled that it would error in trying to JIT the PTX.

jakirkham commented 3 years ago

@raydouglass, do you have any thoughts on Keith's observation above?

leofang commented 3 years ago

From the cmake file that Keith quoted, it looks like for each arch a SASS is generated, and in addition a PTX is generated for the latest arch (potentially for early adoption when a future arch is out).

For this particular use case, my understanding (I think this is a public info) is if we have a new arch out (say sm_90), the CTK major version needs to be bumped anyway, so we should be safe with regard to the Enhanced Compatibility which is not guaranteed across major versions.

I am interested to learn other PTX use cases in the wild.

kkraus14 commented 3 years ago

Yea it's moreso that say someone tries to run this on Maxwell architecture then it would try to JIT the PTX and would error from the PTX version being too new for the driver version in this case.

jakirkham commented 3 years ago

So it sounds like we are on the same page. Would it make sense to return to the individual PRs at this point to make sure they have the needed changes and then working on getting them in? Or are there other things we still need to discuss here before doing that?

kkraus14 commented 3 years ago

I think the only point of contention left is if we move forward in adding CUDA 11.3 / 11.4 docker images. I think we should move forward with building those images to enable packages that can leverage newer features to opt into using them.

Otherwise I think we have everything else in place.

jakirkham commented 3 years ago

Yeah am ok with doing that. Currently the Docker images are waiting for the cudatoolkit packages to be produced ( https://github.com/conda-forge/cudatoolkit-feedstock/pull/60 ). So getting that in would help

leofang commented 3 years ago

I thought we could do the opposite: Get rid of all CUDA images and use vanilla Linux images + CUDA conda packages to build? (As I demo'd in https://github.com/conda-forge/cupy-feedstock/pull/143)

jakirkham commented 3 years ago

Think you are thinking about something else ( https://github.com/conda-forge/cudatoolkit-feedstock/issues/62 ). At some point yes, but we are not ready to go there yet.

Should add would like to keep the scope tight here to make sure we achieve this step in a reasonable timeline 😉

leofang commented 3 years ago

Ah OK, wasn't aware of the timeline consideration. Sounds good to me.

jakirkham commented 3 years ago

Have submitted PR ( https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/pull/172 ) to hot-fix the CUDA 11.2 packages to relax their pinnings.

If folks have time to review that PR and this one, that would be appreciated 😄

jakirkham commented 3 years ago

Any other thoughts on this one? Also the repodata patch has been merged

jakirkham commented 3 years ago

Thanks all! 😄

If anything else comes up, happy to follow up in another issue/PR 🙂

h-vetinari commented 3 years ago

I think only Numba uses PTX and it only uses that at runtime. Am not aware of Numba or other packages distributing PTX. So I don't think PTX is an issue for us

Hey all, sorry I'm late to the party. I don't know all the acronyms, but I'm pretty sure that the faiss feedstock uses PTX too. At least, this looks very similar to what Keith was quoting.

Are there any actions to take here for faiss? I just recently got around to finishing https://github.com/conda-forge/faiss-split-feedstock/pull/46, and I saw that the 11.2 builds were built by CTK 11.4 (I would link the lines in the CI run output, but azure's permalinks are useless, at least on restarted jobs). It's perhaps worth noting that CMake misidentifies this as 11.2 on both linux & windows.

leofang commented 3 years ago

I just recently got around to finishing conda-forge/faiss-split-feedstock#46, and I saw that the 11.2 builds were built by CTK 11.4 (I would link the lines in the CI run output, but azure's permalinks are useless, at least on restarted jobs). It's perhaps worth noting that CMake misidentifies this as 11.2 on both linux & windows.

Yeah Azure's log links are broken these days, not sure what's going on, but it's no longer useful for searching and linking. Anyway, what you said above is technically incorrect. As you noticed there's this line:

2021-10-13T11:25:08.0185619Z -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.2.152")

This is expected --- the CUDA nvcc toolchain picked up by CMake is correct, and it's outside of CF: It's from the bundled CUDA 11.2 provided by CF's docker image. So, it means CMake uses CUDA 11.2 to build the project, and thanks to the ABI compatibility, it'd work with CUDA 11.2+.

Regarding PTX, yeah the lines you linked look similar to the use case Keith described.

h-vetinari commented 3 years ago

Thanks for the response @leofang!

This is expected --- the CUDA nvcc toolchain picked up by CMake is correct, and it's outside of CF: It's from the bundled CUDA 11.2 provided by CF's docker image.

Is this intentional?

So, it means CMake uses CUDA 11.2 to build the project, and thanks to the ABI compatibility, it'd work with CUDA 11.2+.

This situation sounds like a fortunate accident to me. Seeing that faiss is probably affected by the ABI concerns, how would a feedstock opt out of the enhanced compatibility?

Edit: what I mean by "fortunate accident" is that if the build had picked up the conda-forge version (11.4), yet marked it as 11.2+ compatible, things would likely break.

jaimergp commented 3 years ago

I think you need to override the run_exports from nvcc so it's stricter. Right now faiss will have cudatoolkit>=11.2,12.0a, and you need cudatoolkit=11.2.*.

h-vetinari commented 3 years ago

I think you need to override the run_exports from nvcc so it's stricter. Right now faiss will have cudatoolkit>=11.2,12.0a, and you need cudatoolkit=11.2.*.

My understanding was it's backwards- but not forwards-compatible. I.e. running 11.4 with a binary built by 11.2 is fine, but running on 11.2 with a binary built by 11.4 is not.

From that POV, the current run_export seems correct for 11.2, but I was more worried that the conda env apparently had 11.4 (which would hence only be usable by 11.4+), yet the build picked up the nvcc from the docker-image.

If that setup is intentional I can see how it works, but it gives me a queasy feeling. If one of many different knobs is changed and suddenly I pick up a newer nvcc without knowing, my artefacts are not usable on 11.2 anymore. Or am I needlessly worrying about that?

isuruf commented 3 years ago

but I was more worried that the conda env apparently had 11.4 (which would hence only be usable by 11.4+), yet the build picked up the nvcc from the docker-image.

This is intentional and works the same way as other packages. For example, gcc_linux-64 is pinned to 9, but we have libgcc-ng=11 in the environment. Does that worry you too?

h-vetinari commented 3 years ago

This is intentional and works the same way as other packages. For example, gcc_linux-64 is pinned to 9, but we have libgcc-ng=11 in the environment. Does that worry you too?

TBH, I don't yet "get" the GPU compiler stack very well - I was reacting to the PTX-vs-ABI discussion that didn't seem resolved. I'm happy to hear that it's intentional.

isuruf commented 3 years ago

I was reacting to the PTX-vs-ABI discussion that didn't seem resolved.

That was about forward compatibility (compiling with 11.4 and running/linking with 11.2). There are no backward compatibility issues (compiling with 11.2 and running/linking with 11.4).

h-vetinari commented 3 years ago

There are no backward compatibility issues (compiling with 11.2 and running/linking with 11.4).

That much I got 😅

As long we're guarding against accidentally picking up 11.4 ABI (i.e. I was reacting to CTK 11.4 appearing in the build), then it's all peachy.

isuruf commented 3 years ago

As long we're guarding against accidentally picking up 11.4 ABI (i.e. I was reacting to CTK 11.4 appearing in the build), then it's all peachy.

CTK 11.4 appearing in the build is only used for linking and as I said in my previous comment, linking with 11.4 is fine.

conda-forge / nvcc-feedstock

Use CUDA Enhanced Compatibility for 11.2+ #71