OpenDriveLab / ViDAR

[CVPR 2024 Highlight] Visual Point Cloud Forecasting
https://arxiv.org/abs/2312.17655
Apache License 2.0
235 stars 15 forks source link

MMCV, RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found. #22

Closed HaoranZhuExplorer closed 3 months ago

HaoranZhuExplorer commented 3 months ago

Dear authors,

Thank you for your contribution!

I setup the environment according to your readme and your provided requirements.txt in previous issues, however, when I try to run the training script: ./tools/dist_train.sh ${CONFIG} ${GPU_NUM} it gives me the following error for MMCV package while I'm using cuda environment:

File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 297, in forward out = _inner_forward(x) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 274, in _inner_forward out = self.conv2(out) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 251, in forward return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias, File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 73, in forward ext_module.modulated_deform_conv_forward( RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found.

The full error log is available here: error_log.txt. At line 55 of the error log, it shows "MMCV CUDA Compiler: not available", which may be causing the issue. Please note that I'm running the codebase on a slurm GPU HPC, which means the GPU is not installed on my login node by default, and I need to request GPU resources from the HPC. During the experiments, I ran the script after getting the GPU resources, but it still shows the above error.

By following this link, I also try to install mmcv-full cuda version using the following command pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu112/torch1.10/index.html, but it still gives me the same error.

Is there any way to solve this issue? Thanks!

Best regards

tomztyang commented 3 months ago

Hi,

It seems like something wrong on the mmcv installation. On my opinion, please: (1). check your CUDA version and mmcv-full pre-built package CUDA version; (2). rebuild or re-install the environment after assigned GPU resources?

Best, Zetong

HaoranZhuExplorer commented 3 months ago

Thank you for your prompt response! I find that my cuda version mismatches mmcv-full's cuda version. I solve the issue by specifying the explicit cuda version when installing mmcv-full by running the following command: pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html