CUDA out of memory in training ScanNet

y5wang commented 4 years ago

Trying to train ScanNet scene segmentation, but run into CUDA out of memory error. My environment:

Ubuntu 18.04
Python 3.7
PyTorch 1.4
CUDA toolkit 10.1
Tesla K80 with 12GB ram each GPU (also tried on GeForce RTX 2070 with 8GB ram)

The training was started by: export BATCH_SIZE=8; ./scripts/train_scannet.sh 2 -default "--scannet_path ./data/scannet/train" (I've tried BATCH_SIZE=32, and BATCH_SIZE=16, both failed)

Here is the error dump:

...

microway-gpu-ubuntu 03/11 17:50:03 ===> Start training
/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:224: UserWarning: To get the last learning rate computed by the scheduler, please use 'get_last_lr()'.
  warnings.warn("To get the last learning rate computed by the scheduler, "
microway-gpu-ubuntu 03/11 17:50:32 ===> Epoch[1](1/151): Loss 3.1041    LR: 1.000e-01   Score 4.961     Data time: 5.2196, Total iter time: 29.8803
Traceback (most recent call last):
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/yang/projects/SpatioTemporalSegmentation/main.py", line 156, in <module>
    main()
  File "/home/yang/projects/SpatioTemporalSegmentation/main.py", line 149, in main
    train(model, train_data_loader, val_data_loader, config)
  File "/home/yang/projects/SpatioTemporalSegmentation/lib/train.py", line 91, in train
    soutput = model(*inputs)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yang/projects/SpatioTemporalSegmentation/models/res16unet.py", line 252, in forward
    out = self.block8(out)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yang/projects/SpatioTemporalSegmentation/models/modules/resnet_block.py", line 47, in forward
    out = self.norm2(out)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiNormalization.py", line 58, in forward
    output = self.bn(input.F)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 107, in forward
    exponential_average_factor, self.eps)
  File "/home/yang/.conda/envs/st-segmentation/lib/python3.7/site-packages/torch/nn/functional.py", line 1670, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 364.00 MiB (GPU 0; 11.17 GiB total capacity; 9.93 GiB already allocated; 247.81 MiB free; 10.26 GiB reserved in total by PyTorch)

Any help is appreciated.

-- Yang

fengziyue commented 4 years ago

I'm training on S3DIS dataset, on 2080ti(11GB) it also occurred the error "cuda out of memory", It seems that changing the batch size won't reduce GPU memory usage.

But it runs well on tesla p100 (12GB).

@chrischoy

y5wang commented 4 years ago

Guess Tesla K80 doesn't have enough memory. It has 11441MB as reported by nvidia-smi :


(st-segmentation) yang@microway-gpu-ubuntu:~/projects/SpatioTemporalSegmentation$ nvidia-smi
Wed Mar 11 15:08:31 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:06:00.0 Off |                    0 |
| N/A   34C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 00000000:07:00.0 Off |                    0 |
| N/A   24C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 00000000:0A:00.0 Off |                    0 |
| N/A   30C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 00000000:0B:00.0 Off |                    0 |
| N/A   24C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           On   | 00000000:10:00.0 Off |                    0 |
| N/A   32C    P8    25W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           On   | 00000000:11:00.0 Off |                    0 |
| N/A   22C    P8    30W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           On   | 00000000:14:00.0 Off |                    0 |
| N/A   32C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           On   | 00000000:15:00.0 Off |                    0 |
| N/A   22C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+```

chrischoy commented 4 years ago

This is not a bug. OOM states that you need more memory. Please lower the batch size to run it on your environment.

chrischoy / SpatioTemporalSegmentation

CUDA out of memory in training ScanNet #27