Make `${PREFIX}/bin/nvcc` a symlink so that CMake can use the compiler

robertmaynard commented 1 year ago

CMake scans the verbose output of the compiler and requires a root directory that contains bin, lib, and include.

conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

gmarkall commented 1 year ago

I note that without this PR, installing cuda-nvcc without installing c-compiler cxx-compiler results in activating the environment printing out a message:

Please add the C++ compiler to the environment

and then killing the shell it was run from, which is not very desirable - so I'm hopeful that this PR will fix that issue (and I'm happy to test at the relevant time).

jakirkham commented 1 year ago

Yeah the message is expected

The shell dying is not. Unclear why that is happening atm

However we are planning to rip out the activation scripts with this PR. So it may be moot

gmarkall commented 1 year ago

I was interested, so I built this package locally - installing the package and activating the environment without c-compiler / cxx-compiler present now does not emit the message, or kill the shell. NB: When I built the package it didn't pass the tests included in the recipe, but I installed the package anyway as I didn't think that would be relevant for testing whether the activate issue is fixed.

As an aside, I think it kills the shell because in this line:

https://github.com/conda-forge/cuda-nvcc-feedstock/blob/9a05379857ea9e4e86c4ee096f26f14608f21765/recipe/activate.sh#L3

the script is effectively being sourced, so it's running in the current shell from which conda activate was called, and therefore the exit 1 exits the shell. Even if this activate script ends up ultimately being removed, I'd guess the issue can be worked around by only printing the error message and skipping the subsequent exports rather than exiting.

leofang commented 1 year ago

Ah... we forget to bump the build number?

leofang commented 1 year ago

Maybe we can bump it in #3, and then mark all build number 0 as broken?

robertmaynard commented 1 year ago

This is completely wrong. The compiler should not be in ${PREFIX}/${targetsDir}/bin/ as it is a cross compiler and is not target dependent.

To handle this properly we need to introduce a ${PREFIX}/cuda directory I am thinking.

If we move the compiler ( and nvvm ) up a directory to ${PREFIX}/targets/ we break realtive path computation that expects a layout like:

.
├── bin
├── include -> targets/x86_64-linux/include
├── lib64 -> targets/x86_64-linux/lib
├── libnvvp
├── nvml
├── nvvm
├── targets

Since now you can't go up from bin and find targets directory for the cross compilation headers/libraries.

Placing the compiler in ${PREFIX}/bin breaks CMake CUDA support since it expects that the relative include and lib/lib64 to have the CUDA bits which they won't.

isuruf commented 1 year ago

Placing the compiler in ${PREFIX}/bin breaks CMake CUDA support since it expects that the relative include and lib/lib64 to have the CUDA bits which they won't.

Why can't we make those symlinks? For eg: cuda-cudart-dev does for lib (and weirdly not for include)

robertmaynard commented 1 year ago

Placing the compiler in ${PREFIX}/bin breaks CMake CUDA support since it expects that the relative include and lib/lib64 to have the CUDA bits which they won't.

Why can't we make those symlinks? For eg: cuda-cudart-dev does for lib (and weirdly not for include)

Are you asking why we can't symlink everything currently in ${PREFIX}/targets/<arch>/include into ${PREFIX}/include. CMake expects a single common include for the entire CUDA Toolkit and doing so if there are two major reasons that I am aware of:

Concern over the relative common names that nvcc/cudart headers use
That would mean the ask about having both cuda-cccl and the thrust package installed wouldn't be possible as they both would own the same files.

isuruf commented 1 year ago

Can we fix CMake? For cross-compilation case, the layout of ${CUDA_NVCC_PATH}/../include does not make sense.

isuruf commented 1 year ago

Meant CMake, not CUDA. edited the comment above

robertmaynard commented 1 year ago

Can we fix CMake? For cross-compilation case, the layout of ${CUDA_NVCC_PATH}/../include does not make sense.

The above layout ( both the include symlink and targets/<arch>/include ) work with CMake. Very crudely what is used is based on if you are cross compiling or not.

If you are not cross compiling the include symlink will be used.
If you are cross compiling the targets/<arch>/include is used.

For CMake to work in different configurations we need a common root for bin, include, nvvm, and targets ( edited ).

isuruf commented 1 year ago

Can we make CMake do,

If you are not cross compiling the include symlink will be used and fallback to targets/<arch>/include if not there
If you are cross compiling the targets/<arch>/include is used.

The other option is to ask nvcc for these values just like CMake does for gcc, clang, etc.

robertmaynard commented 1 year ago

Can we make CMake do,

If you are not cross compiling the include symlink will be used and fallback to targets/<arch>/include if not there

If you are cross compiling the targets/<arch>/include is used.

The other option is to ask nvcc for these values just like CMake does for gcc, clang, etc.

In short I believe changing CMake to support a custom new layout is a non-starter. It would require anything that uses CMake + conda only work with this new version ( 3.28? ) and that should be the absolute last choice.

CMake supports both CUDA Toolkits with and without nvcc. When nvcc does exist we scrape the compiler output TOP for the root of the install and use that. We don't scrape for all the different include directories via #$ INCLUDES=, and while that could be done it would need some significant changes.

isuruf commented 1 year ago

I agree that requiring a new version of cmake should be the last choice, but what other choice do we have? This PR fixes CMake + native compilation at the expense of cross compilation (with or without cmake).

isuruf commented 1 year ago

I guess it's not particularly this PR that breaks stuff, this PR + #3 .

robertmaynard commented 1 year ago

I am confused on why my proposal to move everything to the ${PREFIX}/cuda/ layout doesn't work

On Tue, May 2, 2023 at 7:48 PM Isuru Fernando @.***> wrote:

I guess it's not particularly this PR that breaks stuff, this PR + #3 https://github.com/conda-forge/cuda-nvcc-feedstock/pull/3 .

— Reply to this email directly, view it on GitHub https://github.com/conda-forge/cuda-nvcc-feedstock/pull/4#issuecomment-1532287801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUVTH6MPKWHK7OXAKARELXEGMOBANCNFSM6AAAAAAXC7SC7M . You are receiving this because you authored the thread.Message ID: @.***>

isuruf commented 1 year ago

That would require all the other packages like cuda-cudart to also adopt that convention.

adibbley commented 1 year ago

Moving everything to $PREFIX/cuda seems to be the easiest solution to our problems here, and aligns better with other CUDA installer layouts.

Started a draft PR in cudart so we can work out the details there https://github.com/conda-forge/cuda-cudart-feedstock/pull/6

isuruf commented 1 year ago

I looked into this a bit more, and I don't think we need to move the layout. We just need to have a cuda-nvcc-tools package and the fact that bin/nvcc is a symlink to targets/linux-x86_64/bin/nvcc becomes just an implmentation detail that only cmake relies on. I was able to get cmake to do a cross compilation identification of nvcc by hacking on the environment without changing the layout.

robertmaynard commented 1 year ago

I looked into this a bit more, and I don't think we need to move the layout. We just need to have a cuda-nvcc-tools package and the fact that bin/nvcc is a symlink to targets/linux-x86_64/bin/nvcc becomes just an implmentation detail that only cmake relies on. I was able to get cmake to do a cross compilation identification of nvcc by hacking on the environment without changing the layout.

Can you share what environment changes you needed?

Edit: info is at https://github.com/conda-forge/cuda-nvcc-feedstock/issues/12

conda-forge / cuda-nvcc-feedstock

Make `${PREFIX}/bin/nvcc` a symlink so that CMake can use the compiler #4