LittlePey / SFD

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion (CVPR 2022, Oral)
Apache License 2.0
262 stars 35 forks source link

training error #60

Open squirreljj opened 8 months ago

squirreljj commented 8 months ago

enviroment: I build a docker from voxel-rcnn,the way is docker pull djiajun1206/pcdet-pytorch1.5 my computer is 3080ti command my training command is :python train.py --cfg_file cfgs/kitti_models/sfd.yaml
and batch_size = 1 error below 2023-10-20 12:14:19,784 INFO **Start training kitti_models/sfd(default)** epochs: 0%| | 0/12 [00:10<?, ?it/s] Traceback (most recent call last): | 0/3712 [00:00<?, ?it/s] File "train.py", line 200, in main() File "train.py", line 155, in main train_model( File "/home/SFD/tools/train_utils/train_utils.py", line 86, in train_model accumulated_iter = train_one_epoch( File "/home/SFD/tools/train_utils/train_utils.py", line 38, in train_one_epoch loss, tb_dict, disp_dict = model_func(model, batch) File "/home/SFD/pcdet/models/init.py", line 30, in model_func ret_dict, tb_dict, disp_dict = model(batch_dict) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, kwargs) File "/home/SFD/pcdet/models/detectors/sfd.py", line 11, in forward batch_dict = cur_module(batch_dict) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "/home/SFD/pcdet/models/backbones_3d/spconv_backbone.py", line 148, in forward x = self.conv_input(input_sp_tensor) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, kwargs) File "/usr/local/lib/python3.8/dist-packages/spconv-1.2.1-py3.8-linux-x86_64.egg/spconv/modules.py", line 134, in forward input = module(input) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.8/dist-packages/spconv-1.2.1-py3.8-linux-x86_64.egg/spconv/conv.py", line 196, in forward out_features = Fsp.indice_subm_conv(features, self.weight, File "/usr/local/lib/python3.8/dist-packages/spconv-1.2.1-py3.8-linux-x86_64.egg/spconv/functional.py", line 87, in forward return ops.indice_conv(features, File "/usr/local/lib/python3.8/dist-packages/spconv-1.2.1-py3.8-linux-x86_64.egg/spconv/ops.py", line 118, in indice_conv return torch.ops.spconv.indice_conv(features, filters, indice_pairs, RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

squirreljj commented 8 months ago

beacause i want to compile sucess,so before i perform python setup.py develop,i perform export TORH_CUDA_ARCH_LIST="7.5",finally, I compile sucess, but show error as i told on list comment.