ayumiymk / aster.pytorch

ASTER in Pytorch
MIT License
663 stars 169 forks source link

CUDA out of memory problem #41

Closed Verazjy closed 4 years ago

Verazjy commented 4 years ago

CUDA out of memory. I encountered the following problems when training the model. I wonder if the author is loading all the data into memory? Is there any other way to solve this problem? Thank you very much.

Traceback (most recent call last): File "main.py", line 229, in <module> main(args) File "main.py", line 213, in main test_dataset=test_dataset) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/trainers.py", line 59, in train output_dict = self._forward(input_dict) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/trainers.py", line 189, in _forward output_dict = self.model(input_dict) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/model_builder.py", line 87, in forward rec_pred = self.decoder([encoder_feats, rec_targets, rec_lengths]) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 39, in forward output, state = self.decoder(x, state, y_prev) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 255, in forward alpha = self.attention_unit(x, sPrev) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 220, in forward sumTanh = torch.tanh(sProj + xProj) RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 10.91 GiB total capacity; 10.30 GiB already allocated; 5.38 MiB free; 35.66 MiB cached)

Verazjy commented 4 years ago

I solved this problem by modifying batch_size=512, but further verification is needed to see if it has any effect on the subsequent performance.