facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.48k stars 2.09k forks source link

Cuda Out of Memory in LSTM SQUAD model #85

Closed ivancruzbht closed 7 years ago

ivancruzbht commented 7 years ago

Does this example require too much VRAM? I am trying to run this example on a box with a Nvida 970m (3GB VRAM) but I got this error:

05/16/2017 04:02:32 PM: [ Ok, let's go... ]
05/16/2017 04:02:32 PM: [ Training for 1000 iters... ]
05/16/2017 04:02:36 PM: [train] updates = 10 | train loss = 9.83 | exs = 310
05/16/2017 04:02:39 PM: [train] updates = 20 | train loss = 9.79 | exs = 623
05/16/2017 04:02:42 PM: [train] updates = 30 | train loss = 9.75 | exs = 938
THCudaCheck FAIL file=/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "examples/drqa/train.py", line 178, in <module>
    main(opt)
  File "examples/drqa/train.py", line 113, in main
    train_world.parley()
  File "/home/ivan/ParlAI/parlai/core/worlds.py", line 505, in parley
    batch_act = self.batch_act(index, batch_observations[index])
  File "/home/ivan/ParlAI/parlai/core/worlds.py", line 479, in batch_act
    batch_actions = a.batch_act(batch_observation)
  File "/home/ivan/ParlAI/parlai/agents/drqa/agents.py", line 192, in batch_act
    self.model.update(batch)
  File "/home/ivan/ParlAI/parlai/agents/drqa/model.py", line 113, in update
    self.optimizer.step()
  File "/home/ivan/anaconda3/lib/python3.6/site-packages/torch/optim/adamax.py", line 68, in step
    torch.max(norm_buf, 0, out=(exp_inf, exp_inf.new().long()))
RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66

I haven't dive in the code, so I am not sure if this is a bug or I just need more VRAM. Thank you

ivancruzbht commented 7 years ago

I just changed the batch size to make it work.