Vegeta2020 / SE-SSD

SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud, CVPR 2021.
Apache License 2.0
811 stars 128 forks source link

CUDA error: an illegal memory access was encountered #68

Closed gebawe closed 2 years ago

gebawe commented 2 years ago

@Vegeta2020 while trying to train the model I am getting an error message "CUDA error: an illegal memory access was encountered" and the training is flailing.

I am using: Pytorch: 1.7.1 Tesla V 100 GPUs

Has anyone faced a similar problem?

losses = model(example, is_ema=[False, output_ema], return_loss=True)

File "/mnt/appl/software/PyTorch/1.7.1-fosscuda-2020b/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/gebreawe/Code/SE-SSD/det3d/models/detectors/voxelnet_sessd.py", line 41, in forward return self.bbox_head.loss(example, preds, is_ema[1]) File "/home/gebreawe/Code/SE-SSD/det3d/models/bbox_heads/mg_head_sessd.py", line 712, in loss consistency_loss = self.consistency_loss(preds_dicts, preds_ema, example) File "/home/gebreawe/Code/SE-SSD/det3d/models/bbox_heads/mg_head_sessd.py", line 682, in consistency_loss box_consistency_loss, idx1, idx2, mask1, mask2 = self.nn_distance(top_box_preds_stu, top_box_preds_tea) File "/home/gebreawe/Code/SE-SSD/det3d/models/bbox_heads/mg_head_sessd.py", line 586, in nn_distance ans_iou = ans_iou[mask1] RuntimeError: CUDA error: an illegal memory access was encountered

Vegeta2020 commented 2 years ago

@awethaileslassie There ate several closed issues about this problem, you may try to search it and find my replies there.