conda-forge / cupy-feedstock

A conda-smithy repository for cupy.
BSD 3-Clause "New" or "Revised" License
5 stars 23 forks source link

Make cuDNN/cuTENSOR/NCCL optional #109

Closed pentschev closed 3 years ago

pentschev commented 3 years ago

Currently CuPy lists cuDNN as a requirement, but given cuDNN is large in size it's possibly a good idea to make it optional instead. A concrete use case is RAPIDS, it depends on CuPy but doesn't use cuDNN, which forces the inclusion of that library in official Docker containers, etc.

@leofang mentioned there may be a few things that need to be changed. Could we discuss those here? Do we need those changes in the CuPy repo or do they need to be here in the feedstock?

cc @kkraus14 @kmaehashi @jakirkham

jakirkham commented 3 years ago

Wee need changes in both ( https://github.com/cupy/cupy/issues/4850#issuecomment-794426626 ). The CuPy bits can be discussed in that issue. The Conda bits are easy to handle once the CuPy bit are done (and in a release)

leofang commented 3 years ago

Thanks, @pentschev!

@kkraus14 @jakirkham I have a question related to this need. I'd also like to make cuTENSOR optional --- which currently is not, as I expanded the build matrix so that we build CuPy without cuTENSOR support (for CUDA 9.2 - 11.2) in cos6 images, and we build it with cuTENSOR (for CUDA 10.1+) in cos7 images. This is a bit too much. In order to make it optional (and reduce the build matrix size), I say we kill the duplicate builds in the former (cos6 + CUDA 10.1+). WDYT? Would this have any real impact to your user base?

jakirkham commented 3 years ago

Does CuPy link to cuTENSOR at build time?

leofang commented 3 years ago

Yes, there're two Cython modules for supporting cuTENSOR.

leofang commented 3 years ago

Yes, there're two Cython modules for supporting cuTENSOR.

Likewise for cuDNN and NCCL.

jakirkham commented 3 years ago

What happens if a user tries to load those modules if they were linked to those libraries, but the libraries were not present? A library load error? Or does CuPy have some way to insulate the user from that?

leofang commented 3 years ago

Python raises ImportError if the symbols in a module cannot be fully resolved (undefined references) by the linker. In the case of cuDNN, CuPy has a try-except to catch this: https://github.com/cupy/cupy/blob/v9.0.0b3/cupy/cuda/cudnn.py#L9-L14 I think this detection can be easily done in both cuTENSOR and NCCL.

jakirkham commented 3 years ago

Sounds good. I think if we have that in place for cuTENSOR and NCCL, it will be easy to make those optional here in the same way

jakirkham commented 3 years ago

Also ok with dropping CentOS 6 (assuming I understood your suggestion above correctly)

RAPIDS is CentOS 7+ only

leofang commented 3 years ago

I raised the question on version pinning in the CuPy counterpart issue: https://github.com/cupy/cupy/issues/4850#issuecomment-795822776. CuPy's library metadata pins at different versions from the latest ones we have on CF, so it needs to be addressed too.

jakirkham commented 3 years ago

Once that is done, I think we can add all of these to ignore_run_exports and run_constrained that way we still build against them, but not ship them by default. Users can then optionally install these extra pieces as they need them

leofang commented 3 years ago

In case it wasn't clear, the work plan is as follows:

  1. Get the PR https://github.com/cupy/cupy/pull/4873 in so that we can give instructions to CF users at runtime for installing missing libraries.
  2. Generate a json file in the recipe here (likely when the recipe is updated next time, say for v8.6.0/v9.0.0rc1) to enable the warning mechanism. Also use the tricks John mentioned above to do the actual opt-out job.
leofang commented 3 years ago

(btw @pentschev I changed the issue title to reflect the new plan.)

leofang commented 3 years ago

@pentschev this is done starting v9.0.0rc1; v8.6 still has to carry the deps...

pentschev commented 3 years ago

Thanks so much @leofang , you're awesome! 😄