Closed 1079863482 closed 3 years ago
2021-05-14 17:46:24,155 - torchocr - INFO - [0/10] - [38840/231229] - lr:0.001 - loss:0.7774 - acc:0.6250 - norm_edit_dis:0.9000 - time:0.6704 2021-05-14 17:46:24,814 - torchocr - INFO - [0/10] - [38850/231229] - lr:0.001 - loss:0.4049 - acc:0.6250 - norm_edit_dis:0.9375 - time:0.6584 2021-05-14 17:46:25,407 - torchocr - ERROR - Traceback (most recent call last): File "tools/rec_train.py", line 234, in train output = net.forward(batch_data['img'].to(to_use_device)) File "/home/cai/project/PytorchOCR/torchocr/networks/architectures/RecModel.py", line 43, in forward x = self.head(x) File "/home/cai/anaconda3/envs/ppocr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/cai/project/PytorchOCR/torchocr/networks/heads/RecCTCHead.py", line 28, in forward return self.fc(x) File "/home/cai/anaconda3/envs/ppocr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/home/cai/anaconda3/envs/ppocr/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward return F.linear(input, self.weight, self.bias) File "/home/cai/anaconda3/envs/ppocr/lib/python3.7/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 7.79 GiB total capacity; 5.24 GiB already allocated; 306.06 MiB free; 5.86 GiB reserved in total by PyTorch)
刚开始显存只占用2G多一点,然后会慢慢在增加,直到cuda显存不够爆掉了。刚开始以为是我批次设太大的问题,设小了过一定的步长也会爆掉。
使用的什么数据?数据是否有问题?
我检查了一下,是数据的问题,数据中有比较多很长的标签。
使用的什么数据?数据是否有问题? 我检查了一下,是数据的问题,数据中有比较多很长的标签。
我是验证的时候出现爆显存 和数据标签有关系吗
2021-05-14 17:46:24,155 - torchocr - INFO - [0/10] - [38840/231229] - lr:0.001 - loss:0.7774 - acc:0.6250 - norm_edit_dis:0.9000 - time:0.6704 2021-05-14 17:46:24,814 - torchocr - INFO - [0/10] - [38850/231229] - lr:0.001 - loss:0.4049 - acc:0.6250 - norm_edit_dis:0.9375 - time:0.6584 2021-05-14 17:46:25,407 - torchocr - ERROR - Traceback (most recent call last): File "tools/rec_train.py", line 234, in train output = net.forward(batch_data['img'].to(to_use_device)) File "/home/cai/project/PytorchOCR/torchocr/networks/architectures/RecModel.py", line 43, in forward x = self.head(x) File "/home/cai/anaconda3/envs/ppocr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/cai/project/PytorchOCR/torchocr/networks/heads/RecCTCHead.py", line 28, in forward return self.fc(x) File "/home/cai/anaconda3/envs/ppocr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/home/cai/anaconda3/envs/ppocr/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward return F.linear(input, self.weight, self.bias) File "/home/cai/anaconda3/envs/ppocr/lib/python3.7/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 7.79 GiB total capacity; 5.24 GiB already allocated; 306.06 MiB free; 5.86 GiB reserved in total by PyTorch)
刚开始显存只占用2G多一点,然后会慢慢在增加,直到cuda显存不够爆掉了。刚开始以为是我批次设太大的问题,设小了过一定的步长也会爆掉。