[Hateful Memes]CUDA runs out of memory while training without actually utilising it

agam-kashyap commented 4 years ago

❓ Questions and Help

Hi,

I have been facing this issue for quite a while now. It when I run the training command for visual_bert pre-trained on coco dataset for the hateful memes challenge.

Commad

mmf_run config=projects/hateful_memes/configs/visual_bert/from_coco.yaml model=visual_bert dataset=hateful_memes

Error

Traceback (most recent call last):
  File "/home/hdd1/fhmc/anaconda3/bin/mmf_run", line 30, in <module>
    sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')())
  File "/home/hdd1/fhmc/mmf/mmf_cli/run.py", line 107, in run
    nprocs=config.distributed.world_size,
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/hdd1/fhmc/mmf/mmf_cli/run.py", line 54, in distributed_main
    main(configuration, init_distributed=True, predict=predict)
  File "/home/hdd1/fhmc/mmf/mmf_cli/run.py", line 44, in main
    trainer.train()
  File "/home/hdd1/fhmc/mmf/mmf/trainers/base_trainer.py", line 250, in train
    report = self._forward_pass(batch)
  File "/home/hdd1/fhmc/mmf/mmf/trainers/base_trainer.py", line 274, in _forward_pass
    model_output = self.model(prepared_batch)
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 445, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/hdd1/fhmc/mmf/mmf/models/base_model.py", line 148, in __call__
    model_output = super().__call__(sample_list, *args, **kwargs)
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hdd1/fhmc/mmf/mmf/models/visual_bert.py", line 536, in forward
    sample_list.masked_lm_labels,
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hdd1/fhmc/mmf/mmf/models/visual_bert.py", line 344, in forward
    image_text_alignment,
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hdd1/fhmc/mmf/mmf/models/visual_bert.py", line 141, in forward
    embedding_output, extended_attention_mask, self.fixed_head_masks
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/transformers/modeling_bert.py", line 386, in forward
    layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask)
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/transformers/modeling_bert.py", line 366, in forward
    intermediate_output = self.intermediate(attention_output)
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/transformers/modeling_bert.py", line 328, in forward
    hidden_states = self.intermediate_act_fn(hidden_states)
  File "/home/hdd1/fhmc/anaconda3/lib/python3.7/site-packages/transformers/modeling_bert.py", line 133, in gelu
    return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
RuntimeError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 1; 10.76 GiB total capacity; 9.73 GiB already allocated; 46.06 MiB free; 9.86 GiB reserved in total by PyTorch)

But as we can see here, the CUDA usage for both devices is minimal and we have empty space.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:3B:00.0  On |                  N/A |
| 41%   43C    P8    37W / 250W |    456MiB / 11018MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:AF:00.0 Off |                  N/A |
| 40%   41C    P8    14W / 250W |      1MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       303      G   /usr/bin/gnome-shell                         114MiB |
|    0      1862      G   /usr/lib/xorg/Xorg                            18MiB |
|    0      1898      G   /usr/bin/gnome-shell                          19MiB |
|    0      2816      G   /usr/lib/xorg/Xorg                            39MiB |
|    0      3000      G   /usr/bin/gnome-shell                         137MiB |
|    0     40765      G   /usr/lib/xorg/Xorg                           123MiB |

Any help would be appreciated, thanks!

apsdehal commented 4 years ago

Are you continuously watching the usage? I would suggest using nvtop to monitor cuda usage.

Since, it is clear that your GPUs are not able to fit the batch size, I would suggest decreasing the batch size and increasing number of updates.

agam-kashyap commented 4 years ago

Sure @apsdehal. Decreasing the batch size does work. Thanks!

facebookresearch / mmf