CUDA OOM on batch size 1 (batch norm)

Hi @Duankaiwen I replicated your code and ran several experiments successfully on the COCO dataset using the following env: PyTorch 1.0.0, CUDA: 10.1.168. gcc=5.4.0

On the same environment, with my own dataset, I get a CUDA OOM (single GPU, batch size = 1). My input image size is the same [511, 511]. The training does run for about ~400 steps before it suddenly shows OOM. There's no steady increase in the GPU memory, so this couldn't be any memory leakage as well.

Here's the complete log trace and config: log.txt

Last few lines of the log:

File "/mnt/dfs/avinashk/CenterNet/CenterNet-owndata-tensorboard/CenterNet/models/py_utils/utils.py", line 15, in forward bn = self.bn(conv) File "/home/avinashk/miniconda3/envs/CenterNet-PT10-TF/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/avinashk/miniconda3/envs/CenterNet-PT10-TF/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 76, in forward exponential_average_factor, self.eps) File "/home/avinashk/miniconda3/envs/CenterNet-PT10-TF/lib/python3.6/site-packages/torch/nn/functional.py", line 1623, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 3.71 GiB (GPU 0; 10.92 GiB total capacity; 7.31 GiB already allocated; 2.67 GiB free; 25.91 MiB cached)

Mainly it is the error in batch norm that confuses me. I'm a Tensorflow user and fairly new to PyTorch. Any help would be appreciated.

Duankaiwen / CenterNet

CUDA OOM on batch size 1 (batch norm) #116