NVIDIA / cuda-python

CUDA Python Low-level Bindings
https://nvidia.github.io/cuda-python/
Other
809 stars 63 forks source link

Splayed layout support #46

Closed jakirkham closed 1 week ago

jakirkham commented 1 year ago

Currently cuda-python relies on all binaries (like nvcc), all headers, and all libraries to live in a single directory (specified by $CUDA_HOME or similar).

However there are use cases (like cross-compilation, as with conda-build) where the build tools may live in one location (and perform builds on that architecture) whereas the headers and libraries may live in a different location (and target a different architecture). In this case not everything lives in $CUDA_HOME.

It would be helpful to have a way of specifying where these different components come from. Here are some options:

  1. Check $NVCC for the nvcc location
  2. Use $CUDA_BIN (if specified) to get build tool directory
  3. Support a list of directories in $CUDA_HOME
  4. ?

Maybe there are other reasonable options worth considering?

robertmaynard commented 1 year ago

Another option is to support $CUDA_HOME being a list of directories. So for header searching cuda-python would append include to each entry in $CUDA_HOME.

jakirkham commented 1 year ago

Thanks Rob! 🙏

Converted the ideas to an enumerated list and added that idea to the list

jakirkham commented 1 year ago

cc @bdice

vyasr commented 6 months ago

Would it make sense to standardize a tool that uses the path to nvcc to parse information out of nvcc.profile? Something that could be used by anyone, not just cuda-python?

leofang commented 3 weeks ago

Sorry for lack of response. Is this still an issue? It seems we have been able to build cuda-python on conda-forge?

jakirkham commented 1 week ago

Yep it is still an issue

We workaround it in conda-forge

Through improvements on the packaging side, we have minimized the workarounds needed ( for example: https://github.com/conda-forge/cuda-python-feedstock/pull/75 ), but we are unable to eliminate them without this fix

leofang commented 1 week ago

@jakirkham I am sorry but I don't get it. In any case we need a way to specify where the CUDA headers are, and setting CUDA_HOME is the right solution that's been working fine. Are there anything specific to the splayed layout that we have not addressed (either in CUDA Python or in conda-forge)?

jakirkham commented 1 week ago

Think an ideal solution (from my perspective), would be moving to a CMake-based build system here as CMake already understands how to work with splayed layouts correctly. For example moving to scikit-build-core would very likely solve this issue. This may be desirable anyways to simplify the build process and make it less bespoke

leofang commented 1 week ago

I still don't understand what specific changes are needed to support splayed layout. @jakirkham this issue might have been outdated since the earlier (CTK 12.0) efforts?

Right now, all headers are expected by CUDA Python to locate in one place, for the purposes of both parsing and compiling. So, it doesn't really matter if headers are in $PREFIX/include or $PREFIX/targets/<platform>/include or /usr/local/cuda/include or any place in the host environment, as long as $CUDA_HOME is set to take a single directory path. Can we please identify which part of CUDA Python is supposed to find headers locating in two or more places (which we'd fail to do so)? I've eyeballed the recent cuda-python recipe and I was unable to find such traces anymore, most likely they were removed starting https://github.com/conda-forge/cuda-python-feedstock/pull/58 and further cleaned up later.

Also, there is no C++ components in this project (at least not yet; there will be once we start working on memory management but it would not happen before this Fall the earliest) and we need to balance with the currently limited engineering resource we have. I don't see the immediate benefit of moving away from setuptools to CMake (+ something that understands CMake like scikit-build-core) when things are already working. That said, if I can enlist a RAPIDS build expert to help with the build system migration, I would not object the change to CMake 😉

jakirkham commented 1 week ago

Nope. This issue was opened because of issues encountered adding CUDA 12 support

Yes CUDA-Python doesn't support splayed layouts. We agree 🙂

The issue is the CUDA compiler and tightly coupled headers and libraries live together. In Conda this means $BUILD_PREFIX (the place where requirements/build are installed). Meanwhile headers and libraries the package uses (IOW requirements/host) are installed to $PREFIX. So $BUILD_PREFIX != $PREFIX, but CUDA Python needs components from both and doesn't know how to handle this

As the wheel ecosystem continues to build out CUDA library wheels, it will likely wind up in the same situation with the same problems

Hopefully that explanation makes more sense. Please feel free to ask more questions

leofang commented 1 week ago

but CUDA Python needs components from both and doesn't know how to handle this

Could you tell me where this statement comes from? Sorry but I just don't think we are getting the points across and we don't understand each other through this media. Please feel free to arrange an offline meeting.

robertmaynard commented 1 week ago

but CUDA Python needs components from both and doesn't know how to handle this

Could you tell me where this statement comes from? Sorry but I just don't think we are getting the points across and we don't understand each other through this media. Please feel free to arrange an offline meeting.

I expect the problem is this: https://github.com/NVIDIA/cuda-python/blob/81e6f866bb0c393e43f422e65d57678d40bcceb4/examples/common/common.py#L22

jakirkham commented 1 week ago

Right like this line

https://github.com/NVIDIA/cuda-python/blob/2f9d31cfacc818da0cd4569dd47ea43c5724d2a0/setup.py#L81


Though am now noticing that in CUDA-Python 12.4.0 there may have been related changes for splayed layouts

https://github.com/NVIDIA/cuda-python/blob/2be0aac1a5cac84fec4137aa9c50525653425352/setup.py#L30

https://github.com/NVIDIA/cuda-python/blob/2be0aac1a5cac84fec4137aa9c50525653425352/setup.py#L77

So maybe we should give this a try to see if it helps

jakirkham commented 1 week ago

Ok made some changes to the conda-forge build to take advantage of the changes in how CUDA_HOME is handled. This is in PR: https://github.com/conda-forge/cuda-python-feedstock/pull/83

If someone could take a look, that would be appreciated 🙂