conda-forge / cuda-nvcc-impl-feedstock

A conda-smithy repository for cuda-nvcc-impl.
BSD 3-Clause "New" or "Revised" License
1 stars 9 forks source link

missing cicc? #9

Open hmaarrfk opened 6 months ago

hmaarrfk commented 6 months ago

Solution to issue cannot be found in the documentation.

Issue

cicc seems to be in ${PREFIX}/nvvm/bin instead of ${PREFIX}/bin

so does libdevice10..bc

xref: https://github.com/conda-forge/tensorflow-feedstock/issues/296

Installed packages

| linux-64/cuda-nvcc-tools-12.0.76-h59595ed_1.conda 
 | linux-64/cuda-nvcc-tools-12.1.105-hd3aeb46_0.conda 
 | linux-64/cuda-nvcc-tools-12.0.76-h59595ed_0.conda

Environment info

.
jakirkham commented 6 months ago

Thanks for raising Mark! 🙏

nvvm is actually the expected location for these files

In CUDA 12, the nvvm contents match the CUDA Toolkit layout. In CUDA 11, the cudatoolkit package is not matching this layout ( https://github.com/conda-forge/cudatoolkit-feedstock/issues/96 )

Hence libdevice and other bits wind up in the wrong place in the cudatoolkit package. Think we discussed this before in issue ( https://github.com/conda-forge/tensorflow-feedstock/issues/296 ) where cudatoolkit package layout issues had cropped up

With cicc itself, it is typically used by nvcc (not usually external programs)

Have seen one other case where cicc was not found, but after further investigation it was due to some build configuration issues ( https://github.com/scopetools/cudadecon/pull/29 )

So am wondering if there is a similar issue here. Do you have more context on the issue that came up?

hmaarrfk commented 6 months ago

Tensorflow 2.15 and cuda builds is where it came up

jakirkham commented 6 months ago

Ok is there a log or something we could look at?

hmaarrfk commented 6 months ago

Not really since we disable building tf on the cis . you can see how I modified the build script though.

but I’ll upload something tomorrow

https://github.com/conda-forge/tensorflow-feedstock/pull/366

edit: this comment in particular shows a small portion of the log https://github.com/conda-forge/tensorflow-feedstock/pull/366#issuecomment-1872621256

jakirkham commented 6 months ago

Completely understandable

An uploaded log would work. Happy to look at snippets too

We might consider setting up TensorFlow on the Quansight CI as well to make that a bit easier to manage

jakirkham commented 5 months ago

Found this (admittedly old) thread, which mentions cicc may need to be in the search path

jakirkham commented 5 months ago

This logic should add NVVM's bin directory to the $PATH

https://github.com/conda-forge/cuda-nvcc-impl-feedstock/blob/45b155617aa7e55f34e2dfd1d22405cf5a4e5139/recipe/nvcc.profile.patch#L12

leofang commented 2 months ago

This logic should add NVVM's bin directory to the $PATH

Is there any action we need in this feedstock?

hmaarrfk commented 2 months ago

i'm not sure. happy to revisit in the future.

I haven't had time to go through the tensorflow builds in a long time.

LourensVeen commented 2 weeks ago

I'm seeing the same issue as in https://github.com/scopetools/cudadecon/pull/29, in a similar situation with old CUDA code using CMake. And I can reproduce it without calling cicc directly:

conda create -n test
conda activate test
conda install cuda-toolkit

touch source.cu
${CONDA_PREFIX}/bin/nvcc -c source.cu        # works

${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc -c source.cu
<command-line>: fatal error: cuda_runtime.h: No such file or directory

${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc -c -I${CONDA_PREFIX}/targets/x86_64-linux/include source.cu
sh: 1: cicc: not found

For the failing call, strace tells me:

openat(AT_FDCWD, "${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc.profile", O_RDONLY) = -1 ENOENT (No such file or directory)

while for the successful one it says:

openat(AT_FDCWD, "${CONDA_PREFIX}/bin/nvcc.profile", O_RDONLY) = 3

This latter file contains the line

CICC_PATH        = $(TOP)/nvvm/bin

which explains why cicc isn't found, I think.

So if nvcc tries to find its configuration in a location relative to itself, perhaps the symlink for nvcc should be accompanied by one for nvcc.profile?

LourensVeen commented 2 weeks ago

And a little more digging: CMake runs nvcc -v __cmake_determine_cuda, which prints the configuration as created from nvcc.profile and then errors out. This has the line

#$ TOP=${CONDA_PREFIX}/bin/../targets/x86_64-linux

which CMake then uses to locate nvcc at ${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc, from where it can't find its configuration.

So it's the nvcc.profile itself that points CMake to a version of nvcc that cannot read nvcc.profile :smile:.

Adding a symlink at ${CONDA_PREFIX)/targets/x86_64-linux/bin/nvcc.profile to ${CONDA_PREFIX}/bin/nvcc.profile fixes the problem.

leofang commented 2 weeks ago

@robertmaynard @adibbley do you have insights for what Lorens brought up above?

LourensVeen commented 2 weeks ago

I've now also added a symlink for the bin/crt directory, to avoid errors linking code that uses the driver API with the stubs. I have more issues still, but the code I'm working on is also messy so they may be unrelated.

robertmaynard commented 1 week ago

What Cmake version are you using? This sounds like an older version of CMake that didn't properly handle symlinks inside TOP and has been fixed

LourensVeen commented 1 week ago

This is a new CMake, but with an old configuration that uses the now-obsolete FindCUDA macro.

But my first example reproduces the problem without CMake being involved in any way. Are you saying that users are expected to first resolve the symlink at ${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc, rather than trying to run it directly as if it were the linked-to executable?

robertmaynard commented 1 week ago

But my first example reproduces the problem without CMake being involved in any way. Are you saying that users are expected to first resolve the symlink at ${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc, rather than trying to run it directly as if it were the linked-to executable?

After looking at this more the issue is entirely due to a bad setup by conda. You are correct that a nvcc.profile needs to be beside the nvcc symlink in ${CONDA_PREFIX}/targets/x86_64-linux/bin/.

In the current form the nvcc at ${CONDA_PREFIX}/targets/x86_64-linux/bin/ is broken and the verbose output from the compiler looks like:

#$ NVCC_PREPEND_FLAGS=" -ccbin=/home/rmaynard/miniconda3/envs/cuda_stub_env/bin/x86_64-conda-linux-gnu-c++"
#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/home/rmaynard/miniconda3/envs/cuda_stub_env/targets/x86_64-linux/bin
#$ _THERE_=/home/rmaynard/miniconda3/envs/cuda_stub_env/targets/x86_64-linux/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ "/home/rmaynard/miniconda3/envs/cuda_stub_env/bin"/x86_64-conda-linux-gnu-c++ ....

When I symlink the nvcc.profile as well into targets/x86_64-linux/bin I see proper paths for the crt headers being included and a simple test case properly finds them.

@leofang @adibbley We need to create a nvcc.profile symlink like we do for targets/x86_64-linux/bin/nvcc

robertmaynard commented 1 week ago

@LourensVeen The only reason that ${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc exists is to support legacy CMake versions where the FindCUDA or FindCUDAToolkit would validate the CUDA Toolkit layout by searching for a nvcc executable under bin. Therefore we have that symlink so that targets/x86_64-linux/ matches the checked layout.

But I also believe that if we are going to offer a symlink to the compiler it should work so we don't give footguns to users

Edit: So at some point expect targets/x86_64-linux/bin/nvcc to go away and the only nvcc compiler to be in <prefix>/bin

LourensVeen commented 1 week ago

Okay, that makes sense to me. I'll be updating that CMake config.

You need to symlink bin/crt as well to make nvcc work if you want a temporary solution.