CUDE out of memory on custom data

purijs commented 3 years ago

I'm trying to train on 256x256 tiles, both train and test are 256x256. Also, I have a 16GB GPU still it runs out of memory

2020-10-19 17:45:53.661774: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
20-10-19 17:45:54.452 - INFO: Start training from epoch: 0, iter: 0
Traceback (most recent call last):
  File "train.py", line 100, in <module>
    main(config)
  File "train.py", line 79, in main
    trainer.train()
  File "/home/jovyan/EESRGAN/trainer/cowc_GAN_FRCNN_trainer.py", line 88, in train
    self.model.optimize_parameters(current_step)
  File "/home/jovyan/EESRGAN/model/ESRGAN_EESN_FRCNN_Model.py", line 167, in optimize_parameters
    self.fake_H, self.final_SR, self.x_learned_lap_fake, _ = self.netG(self.var_L)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jovyan/EESRGAN/model/model.py", line 569, in forward
    x_base = self.netRG(x) # add bicubic according to the implementation by author but not stated in the paper
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jovyan/EESRGAN/model/model.py", line 334, in forward
    fea = self.lrelu(self.upconv2(F.interpolate(fea, scale_factor=2, mode='nearest')))
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 14.73 GiB total capacity; 13.73 GiB already allocated; 135.88 MiB free; 13.75 GiB reserved in total by PyTorch)

Jakaria08 commented 3 years ago

Try to use batch size = 3 or lower.

MrCrowbar commented 3 years ago

Hello, I'm having the same issue but using the COWC dataset that is used as reference on Github. Do you refer to the batch size in config_GAN.json? Thanks!

Jakaria08 commented 3 years ago

Yes, change it in the config_GAN.json file. Thanks.

Jakaria08 / EESRGAN

CUDE out of memory on custom data #16