WuJie1010 / Facial-Expression-Recognition.Pytorch

A CNN based pytorch implementation on facial expression recognition (FER2013 and CK+), achieving 73.112% (state-of-the-art) in FER2013 and 94.64% in CK+ dataset
MIT License
1.79k stars 550 forks source link

RuntimeError: CUDA error: out of memory #17

Closed dearhoper closed 5 years ago

dearhoper commented 5 years ago

When I run the training process, it reports the RuntimeError. The GPU in my Linux system is Tesla K40c. Why it prompts "out of memory"? Can you help me?

==> Preparing data..
==> Building model..
/usr/local/lib/python2.7/dist-packages/torch/cuda/__init__.py:116: UserWarning: 
    Found GPU1 Quadro K420 which is of cuda capability 3.0.
    PyTorch no longer supports this GPU because it is too old.

  warnings.warn(old_gpu_warn % (d, name, major, capability[1]))

Epoch: 0
learning_rate: 0.01
mainpro_FER.py:119: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  train_loss += loss.data[0]
 [=========== 225/225 =========>] | Loss: 1.836 | Acc: 26.000% (7531/28709)                                                                                                                                                                            
mainpro_FER.py:142: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  inputs, targets = Variable(inputs, volatile=True), Variable(targets)
mainpro_FER.py:146: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  PublicTest_loss += loss.data[0]
Traceback (most recent call last):| Loss: 1.628 | Acc: 28.000% (37/128)                                                                                                                                                                                
  File "mainpro_FER.py", line 215, in <module>
    PublicTest(epoch)
  File "mainpro_FER.py", line 143, in PublicTest
    outputs = net(inputs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/EmotionRecognition/Facial-Expression-Recognition.Pytorch/models/vgg.py", line 23, in forward
    out = self.features(x)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory
WuJie1010 commented 5 years ago

May be you can decrease your batch size

dearhoper commented 5 years ago

Thanks, decreasing the batch size can fix the problem. And can you tell me the resources of your computer for the training?

WuJie1010 commented 5 years ago

parser.add_argument('--model', type=str, default='VGG19', help='CNN architecture') parser.add_argument('--dataset', type=str, default='FER2013', help='CNN architecture') parser.add_argument('--bs', default=128, type=int, help='batch size') parser.add_argument('--lr', default=0.01, type=float, help='learning rate')

dearhoper commented 5 years ago

I mean the GPU resource for supporting the 128 batch size training.

WuJie1010 commented 5 years ago

nvidia 1080 Ti

DanielXu123 commented 4 years ago

It's kind of weird, I can use batch_size 64 for training, but can't use use 64 for testing. I'm pretty sure my GPU memory is quiet enough.

imKeith commented 1 year ago

y

i met this problem too.did u figure it out?