NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
2.94k stars 753 forks source link

error when make #923

Open TianheLu opened 11 months ago

TianheLu commented 11 months ago

Hello, I tried to make the nccl, but I met error as follows: transport/p2p.cc: In function ‘ncclResult_t ncclP2pFreeShareableBuffer(ncclIpcDesc*)’: transport/p2p.cc:220:5: error: ‘CUmemAllocationHandleType’ was not declared in this scope 220 | CUmemAllocationHandleType type = NCCL_P2P_HANDLE_TYPE; | ^~~~~~~~~ transport/p2p.cc:222:9: error: ‘type’ was not declared in this scope 222 | if (type == CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR) { | ^~~~ make[2]: Entering directory '/root/nccl/nccl/src/collectives/device' transport/p2p.cc:222:17: error: ‘CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR’ was not declared in this scope 222 | if (type == CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR) { | ^~~~~~~~~~~~ make[1]: *** [Makefile:119: /root/nccl/nccl/build/obj/transport/p2p.o] Error 1

I don't know why it happens, and I believe there will be many errors like this. Thanks so much.

KaimingOuyang commented 11 months ago

Perhaps you at least need to upgrade your cuda toolkit to 11.0

TianheLu commented 11 months ago

Thanks for helping. I use "nvcc -V" and the result is: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Mon_Oct_24_19:12:58_PDT_2022 Cuda compilation tools, release 12.0, V12.0.76 Build cuda_12.0.r12.0/compiler.31968024_0

And I use "nvidia-smi", the result is: NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0

It seems that my CUDA version is 12.0. I still don't know why this error happens.

Thank you in advance.

KaimingOuyang commented 11 months ago

what is your compilation command?

Can you show me echo $CUDA_HOME?

TianheLu commented 11 months ago

what is your compilation command?

Can you show me echo $CUDA_HOME?

The result of echo $CUDA_HOME is: :/usr/local/cuda I use 'make -j src.build' to make.

KaimingOuyang commented 11 months ago

Just want to confirm your CUDA_HOME is /usr/local/cuda instead of :/usr/local/cuda?

And then, can you try make -j src.build CUDA_HOME=$CUDA_HOME?

TianheLu commented 11 months ago

I tried to change my CUDA_HOME from :/usr/local/cuda to /usr/local/cuda.

And also tried make -j src.build CUDA_HOME=$CUDA_HOME.

But I met the same error.

Just want to confirm your CUDA_HOME is /usr/local/cuda instead of :/usr/local/cuda?

And then, can you try make -j src.build CUDA_HOME=$CUDA_HOME?

KaimingOuyang commented 11 months ago

Looks weird. Could you please share your env variable PATH and output of $CUDA_HOME/bin/nvcc -V? and do you have any env variable that sets up the include path such as C_INCLUDE_PATH or CXX_INCLUDE_PATH

TianheLu commented 11 months ago

The PATH is: /opt/anaconda3/bin:/opt/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin

The result of $CUDA_HOME/bin/nvcc -V is: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Mon_Oct_24_19:12:58_PDT_2022 Cuda compilation tools, release 12.0, V12.0.76 Build cuda_12.0.r12.0/compiler.31968024_0

And I have no include path such as C_INCLUDE_PATH or CXX_INCLUDE_PATH.

I also think it's weird. Thanks.

Looks weird. Could you please share your env variable PATH and output of $CUDA_HOME/bin/nvcc -V? and do you have any env variable that sets up the include path such as C_INCLUDE_PATH or CXX_INCLUDE_PATH

KaimingOuyang commented 11 months ago

If so, could you please upgrade your cuda to 12.1? I am afraid your previous cuda installation has a problem.

TianheLu commented 11 months ago

If so, could you please upgrade your cuda to 12.1? I am afraid your previous cuda installation has a problem.

Yeah, I will have a try. Thank you very much.