Open xiaotiancai899 opened 1 year ago
@ngoductuanlhp
You could check similar issues on the original repo of spconv
: https://github.com/traveller59/spconv/issues/406, https://github.com/mit-han-lab/bevfusion/issues/82.
Best.
Those two cannot solve my problem. Any other advice?
You could check similar issues on the original repo of
spconv
: traveller59/spconv#406, mit-han-lab/bevfusion#82.Best.
When I was training the ScanNet200 dataset, An error occured at the epoch55 out of 120.
Traceback (most recent call last): File "tools/train.py", line 332, in
main()
File "tools/train.py", line 323, in main
train(epoch, model, optimizer, scheduler, scaler, train_loader, cfg, logger, writer)
File "tools/train.py", line 80, in train
loss, log_vars = model(batch, return_loss=True, epoch=epoch - 1) # 这个epoch有没有可能会变成-1之类的啊???
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/isbnet.py", line 219, in forward
return self.forward_train(batch, epoch=epoch)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/util/utils.py", line 172, in wrapper
return func(new_args, new_kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/isbnet.py", line 265, in forward_train
feats, coords_float, voxel_coords, spatial_shape, batch_size, p2v_map
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/isbnet.py", line 632, in forward_backbone
output = self.unet(output)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 249, in forward
output_decoder = self.conv(output)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/modules.py", line 137, in forward
input = module(input)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 404, in forward
raise e
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 395, in forward
timer=input._timer)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/ops.py", line 465, in get_indice_pairs_implicit_gemm
stream_int=stream)
RuntimeError: /tmp/pip-build-env-a41g0q_q/overlay/lib/python3.7/site-packages/cumm/include/tensorview/cuda/launch.h(53)
N > 0 assert faild. CUDA kernel launch blocks must be positive, but got N= 0
I used bach_size=1, and also avoided OOM during training freezing all BatchNorm layers during training. Any ideas about that? Thanks so much in advance!