VinAIResearch / ISBNet

ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution (CVPR 2023)
Apache License 2.0
104 stars 22 forks source link

Training ScanNet200 dataset Error #19

Open xiaotiancai899 opened 1 year ago

xiaotiancai899 commented 1 year ago

When I was training the ScanNet200 dataset, An error occured at the epoch55 out of 120.

Traceback (most recent call last): File "tools/train.py", line 332, in main() File "tools/train.py", line 323, in main train(epoch, model, optimizer, scheduler, scaler, train_loader, cfg, logger, writer) File "tools/train.py", line 80, in train loss, log_vars = model(batch, return_loss=True, epoch=epoch - 1) # 这个epoch有没有可能会变成-1之类的啊??? File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/isbnet.py", line 219, in forward return self.forward_train(batch, epoch=epoch) File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/util/utils.py", line 172, in wrapper return func(new_args, new_kwargs) File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/isbnet.py", line 265, in forward_train feats, coords_float, voxel_coords, spatial_shape, batch_size, p2v_map File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/isbnet.py", line 632, in forward_backbone output = self.unet(output) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward output_decoder = self.u(output_decoder) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward output_decoder = self.u(output_decoder) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward output_decoder = self.u(output_decoder) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward output_decoder = self.u(output_decoder) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward output_decoder = self.u(output_decoder) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 249, in forward output_decoder = self.conv(output) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/modules.py", line 137, in forward input = module(input) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 404, in forward raise e File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 395, in forward timer=input._timer) File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/ops.py", line 465, in get_indice_pairs_implicit_gemm stream_int=stream) RuntimeError: /tmp/pip-build-env-a41g0q_q/overlay/lib/python3.7/site-packages/cumm/include/tensorview/cuda/launch.h(53) N > 0 assert faild. CUDA kernel launch blocks must be positive, but got N= 0

I used bach_size=1, and also avoided OOM during training freezing all BatchNorm layers during training. Any ideas about that? Thanks so much in advance!

xiaotiancai899 commented 1 year ago

@ngoductuanlhp

ngoductuanlhp commented 1 year ago

You could check similar issues on the original repo of spconv: https://github.com/traveller59/spconv/issues/406, https://github.com/mit-han-lab/bevfusion/issues/82.

Best.

xiaotiancai899 commented 1 year ago

Those two cannot solve my problem. Any other advice?

You could check similar issues on the original repo of spconv: traveller59/spconv#406, mit-han-lab/bevfusion#82.

Best.