Open yancie-yjr opened 2 years ago
why trigger CUDA out of memory, we use the default setting ----- 2 2080ti GPU, batch = 48
information:x 38 y 24 z 16 min_voxel_coord tensor([-19., -12., -8.]) voxel_size tensor([0.3000, 0.3000, 0.3000]) ======>>>>> Online epoch: #0, lr=0.001000 <<<<<====== 0%| | 0/1627 [00:10<?, ?it/s] Traceback (most recent call last): File "train_tracking.py", line 161, in loss_comp_part=criterion_completion(completion_points,data['completion_PC_part'],None) File "/home/yangjinrong/miniconda3/envs/v2b/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/data/V2B/loss/PCLosses.py", line 48, in forward P = batch_pairwise_dist(preds, gts, self.use_cuda) File "/data/V2B/loss/PCLosses.py", line 12, in batch_pairwise_dist yy = torch.bmm(y, y.transpose(2, 1)) RuntimeError: CUDA out of memory. Tried to allocate 768.00 MiB (GPU 0; 10.76 GiB total capacity; 9.31 GiB already allocated; 307.44 MiB free; 9.61 GiB reserved in total by PyTorch) INFO[0976] Worker ws-cce98f8120e8e377-worker-dczdh Failed agent= message="user command: exit status 1" reason=Failed connection lost
why trigger CUDA out of memory, we use the default setting ----- 2 2080ti GPU, batch = 48
information:x 38 y 24 z 16 min_voxel_coord tensor([-19., -12., -8.]) voxel_size tensor([0.3000, 0.3000, 0.3000]) ======>>>>> Online epoch: #0, lr=0.001000 <<<<<======
0%| | 0/1627 [00:10<?, ?it/s] Traceback (most recent call last):
File "train_tracking.py", line 161, in
loss_comp_part=criterion_completion(completion_points,data['completion_PC_part'],None) File "/home/yangjinrong/miniconda3/envs/v2b/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs)
File "/data/V2B/loss/PCLosses.py", line 48, in forward
P = batch_pairwise_dist(preds, gts, self.use_cuda)
File "/data/V2B/loss/PCLosses.py", line 12, in batch_pairwise_dist
yy = torch.bmm(y, y.transpose(2, 1))
RuntimeError: CUDA out of memory. Tried to allocate 768.00 MiB (GPU 0; 10.76 GiB total capacity; 9.31 GiB already allocated; 307.44 MiB free; 9.61 GiB reserved in total by PyTorch) INFO[0976] Worker ws-cce98f8120e8e377-worker-dczdh Failed agent= message="user command: exit status 1" reason=Failed connection lost