ADLab-AutoDrive / BEVHeight

An official code release of our CVPR'23 paper, BEVHeight
MIT License
200 stars 25 forks source link

ncclUnhandledCudaError: Call to CUDA function failed. #13

Closed MingfuYAN closed 1 year ago

MingfuYAN commented 1 year ago

First of all, thank you very much for your excellent work. When I run the code using the docker image you provided, the following error occurs. RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:38, unhandled cuda error, NCCL version 2.7.8 ncclUnhandledCudaError: Call to CUDA function failed. Here is some of my cuda version information.

{                                                                                                                                                                                               
   "cuda" : {                                                                                                                                                                                   
      "name" : "CUDA SDK",                                                                                                                                                                      
      "version" : "11.6.20220318"                                                                                                                                                               
   },                                                                                                                                                                                           
   "cuda_nvcc" : {                              
      "name" : "CUDA NVCC",                     
      "version" : "11.6.124"                    
   },  
   "nvidia_driver" : {
      "name" : "NVIDIA Linux Driver",
      "version" : "510.47.03"
   }
yanglei18 commented 1 year ago

@mingfuyan We only test this docker image on 2080Ti and V100, the NVIDIA Linux Driver should be 470.182.03 . The following command can be used to creat a docker instance.

docker run -it --gpus all --shm-size=32g -v /home/user:/root --name cuda11.1  f90d66fc0efb bash