CUDA out of memory - Githubissues

bierdopje90 commented 1 year ago

I tried to train with the default dataset but it keeps running out of cuda memory at 43% of the first epoch. I tried lowering max_prim to 10000 but it keeps giving errors at exactly 43%. I'm running it on a rtx 3090.

Traceback (most recent call last): File "/home/maarten/CADTransformer/train_cad_ddp.py", line 246, in <module> main() File "/home/maarten/CADTransformer/train_cad_ddp.py", line 194, in main seg_pred = model(image, xy, rgb_info, nns) File "/home/maarten/anaconda3/envs/cad2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/maarten/anaconda3/envs/cad2/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 963, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/maarten/anaconda3/envs/cad2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/maarten/CADTransformer/models/model.py", line 122, in forward xy_embed_list = self.transformers([xy, xy_embed, nns]) File "/home/maarten/anaconda3/envs/cad2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/maarten/CADTransformer/models/vit.py", line 248, in forward x_list = self.forward_features(feat=feat) File "/home/maarten/CADTransformer/models/vit.py", line 241, in forward_features _, xy_embed, _, xy_embed_list, attns = self.blocks([xy, xy_embed, nns, xy_embed_list, None]) # [1, 145, 384] -> [1, 145, 384] File "/home/maarten/anaconda3/envs/cad2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/maarten/anaconda3/envs/cad2/lib/python3.10/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/maarten/anaconda3/envs/cad2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/maarten/CADTransformer/models/vit.py", line 112, in forward x, attn = self.attn(xyz, self.norm1(xy_embed), nns) File "/home/maarten/anaconda3/envs/cad2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/maarten/CADTransformer/models/vit.py", line 149, in forward q, k, v = q_feat, index_points(k_feat, knn_idx), index_points(v_feat, knn_idx) # q: b x n x h*f, kv: b x n x k x h*f File "/home/maarten/CADTransformer/models/vit.py", line 87, in index_points res = torch.gather(points.clone(), 1, idx[..., None].expand(-1, -1, points.size(-1))) RuntimeError: CUDA out of memory. Tried to allocate 680.00 MiB (GPU 0; 23.70 GiB total capacity; 19.50 GiB already allocated; 308.00 MiB free; 21.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

fredlumm commented 1 year ago

I had the same issue, and the code stop exactly at 43%...

zhiwenfan commented 1 year ago

see FQA, reduce args.max_prim according to your gpu memory.

VITA-Group / CADTransformer

CUDA out of memory #8