Megvii-BaseDetection / BEVDepth

Official code for BEVDepth.
MIT License
688 stars 96 forks source link

voxel_pooling/voxel_pooling_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol #110

Closed qipengh closed 1 year ago

qipengh commented 1 year ago

env: pytorch1.9 cuda 11.3 mmcls 0.24.1 mmcv-full 1.6.0 mmdet 2.26.0 mmdet3d 1.0.0rc4 /home/huqiuchen/tmp_huangqipeng/pytorch/mmdetection3d mmsegmentation 0.25.0

run sh: BEVDepth# python bevdepth/exps/mv/bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da_ema.py --amp_backend native -b 8 --gpus 8

Error:

/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3'
Traceback (most recent call last):
  File "bevdepth/exps/mv/bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da_ema.py", line 24, in <module>
    from bevdepth.exps.base_cli import run_cli
  File "/home/huqiuchen/tmp_huangqipeng/pytorch/BEVDepth/bevdepth/exps/base_cli.py", line 10, in <module>
    from .base_exp import BEVDepthLightningModel
  File "/home/huqiuchen/tmp_huangqipeng/pytorch/BEVDepth/bevdepth/exps/base_exp.py", line 18, in <module>
    from bevdepth.models.base_bev_depth import BaseBEVDepth
  File "/home/huqiuchen/tmp_huangqipeng/pytorch/BEVDepth/bevdepth/models/base_bev_depth.py", line 3, in <module>
    from bevdepth.layers.backbones.base_lss_fpn import BaseLSSFPN
  File "/home/huqiuchen/tmp_huangqipeng/pytorch/BEVDepth/bevdepth/layers/backbones/__init__.py", line 1, in <module>
    from .base_lss_fpn import BaseLSSFPN
  File "/home/huqiuchen/tmp_huangqipeng/pytorch/BEVDepth/bevdepth/layers/backbones/base_lss_fpn.py", line 10, in <module>
    from bevdepth.ops.voxel_pooling import voxel_pooling
  File "/home/huqiuchen/tmp_huangqipeng/pytorch/BEVDepth/bevdepth/ops/voxel_pooling/__init__.py", line 1, in <module>
    from .voxel_pooling import voxel_pooling
  File "/home/huqiuchen/tmp_huangqipeng/pytorch/BEVDepth/bevdepth/ops/voxel_pooling/voxel_pooling.py", line 5, in <module>
    from . import voxel_pooling_ext
ImportError: /home/huqiuchen/tmp_huangqipeng/pytorch/BEVDepth/bevdepth/ops/voxel_pooling/voxel_pooling_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol: _Z37voxel_pooling_forward_kernel_launcheriiiiiiPKiPKfPfPiP11CUstream_st
root@bogon:/home/huqiuchen/tmp_huangqipeng/pytorch/BEVDepth# c++filt _Z37voxel_pooling_forward_kernel_launcheriiiiiiPKiPKfPfPiP11CUstream_st
voxel_pooling_forward_kernel_launcher(int, int, int, int, int, int, int const*, float const*, float*, int*, CUstream_st*)

I think,voxel_pooling_forward_kernel_launcher of voxel_pooling_forward_cuda.cu not be included in voxel_pooling_forward.cpp and not be compiled into voxel_pooling_ext.cpython-38-x86_64-linux-gnu.so。can anyone help me?

yinchimaoliang commented 1 year ago

Hi, there. Please check if you can pass the unittest of voxel pooling.

qipengh commented 1 year ago

It has been solved. It's a gpu machine problem. You can't get the GPU device, and it's not compiled correctly.

BaronLeeLZP commented 8 months ago

It has been solved. It's a gpu machine problem. You can't get the GPU device, and it's not compiled correctly.

同样的问题(envs一致,8张A100;在另一个单张A100上同样的环境下运行正常),请教一下,如何解决?