LiyaoTang / contrastBoundary

Contrastive Boundary Learning for Point Cloud Segmentation (CVPR2022)
MIT License
139 stars 11 forks source link

RuntimeError: No active exception to reraise #22

Closed whuhxb closed 2 years ago

whuhxb commented 2 years ago

Hi @LiyaoTang

Have you ever met this bug when running with more than 20 epochs and suddenly corrupt? Thanks.

File "exp/mydata/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 462, in main() File "exp/mydata/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 130, in main mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args)) File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/sitepackages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/exp/mydata/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 271, in main_worker loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/exp/mydata/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 333, in train loss = criterion(output, target, stage_list) File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, *kwargs) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/model/pointtransformer_seg.py", line 24, in forward loss_list += self.contrast_head(output, target, stage_list) File "/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/model/heads.py", line 251, in forward loss = self.main_contrast(n, i, stage_list, target) File "/export/home2/author/Documents/contrastBoundary/pytorch_mydata/model/heads.py", line 232, in point_contrast raise RuntimeError: No active exception to reraise

/export/home2/author/anaconda3/envs/pt_CBL/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 59 leaked semaphores to clean up at shutdown len(cache))

How to deal with all False in point_mask? https://github.com/LiyaoTang/contrastBoundary/blob/master/pytorch/model/heads.py#L232

    # print(point_mask.shape)
    # print(torch.any(point_mask), point_mask[:10])
    if not torch.any(point_mask):
        if i == 0:
            o = torch.cat([torch.tensor([0]).to(o.device), o])
            for bi, (start, end) in enumerate(zip(o[:-1], o[1:])):
                print('bi / labelcnt - ', bi , ' / ', torch.unique(labels[start:end].argmax(dim=1)))
            print(point_mask.sum(0), len(point_mask))
            print(labels[0], neighbor_label[0])
            print(labels[100], neighbor_label[100])
            print(labels[900], neighbor_label[900])
            print(flush=True)
            raise
        return torch.tensor(.0)
LiyaoTang commented 2 years ago

Hi, you could comment out the "if i == 0" clause. That is to say, if all false (no boundary detected), we return 0 loss.

whuhxb commented 2 years ago

Hi @LiyaoTang Thank you for your timely reply. I haven't comment out "if i==0" clause. I just use the code you released, and the bug occurs. I will try comment out the "if i ==0" clause, and run again.

LiyaoTang commented 2 years ago

Yes, because it is unlikely to have no boundary if using S3DIS and processing the whole room at once as the way of point-transformer. If you use other data, it will be a different situation.