Closed whuhxb closed 2 years ago
Hi, you could comment out the "if i == 0" clause. That is to say, if all false (no boundary detected), we return 0 loss.
Hi @LiyaoTang Thank you for your timely reply. I haven't comment out "if i==0" clause. I just use the code you released, and the bug occurs. I will try comment out the "if i ==0" clause, and run again.
Yes, because it is unlikely to have no boundary if using S3DIS and processing the whole room at once as the way of point-transformer. If you use other data, it will be a different situation.
Hi @LiyaoTang
Have you ever met this bug when running with more than 20 epochs and suddenly corrupt? Thanks.
File "exp/mydata/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 462, in
main()
File "exp/mydata/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 130, in main
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/sitepackages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error: Traceback (most recent call last): File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/exp/mydata/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 271, in main_worker loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/exp/mydata/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 333, in train loss = criterion(output, target, stage_list) File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, *kwargs) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/model/pointtransformer_seg.py", line 24, in forward loss_list += self.contrast_head(output, target, stage_list) File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/model/heads.py", line 251, in forward loss = self.main_contrast(n, i, stage_list, target) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/model/heads.py", line 232, in point_contrast raise RuntimeError: No active exception to reraise
/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 59 leaked semaphores to clean up at shutdown len(cache))
How to deal with all False in point_mask? https://github.com/LiyaoTang/contrastBoundary/blob/master/pytorch/model/heads.py#L232