Increase in GPU memory usage with Pytorch-Lightning

VitorGuizilini commented 4 years ago

Over the last week I have been porting my code on monocular depth estimation to Pytorch-Lightning, and everything is working perfectly. However, my models seem to require more GPU memory than before, to the point where I need to significantly decrease batch size at training time. These are the Trainer parameters I am using, and relevant versions:

FROM nvidia/cuda:10.1-devel-ubuntu18.04
ENV PYTORCH_VERSION=1.4.0
ENV TORCHVISION_VERSION=0.5.0
ENV CUDNN_VERSION=7.6.5.32-1+cuda10.1
ENV NCCL_VERSION=2.4.8-1+cuda10.1
ENV PYTORCH_LIGHTNING_VERSION=0.7.1

cfg.arch.gpus = 8
cfg.arch.num_nodes = 1
cfg.arch.num_workers = 8
cfg.arch.distributed_backend = 'ddp'
cfg.arch.amp_level = 'O0'
cfg.arch.precision = 32
cfg.arch.benchmark = True 
cfg.arch.min_epochs = 1
cfg.arch.max_epochs = 50
cfg.arch.checkpoint_callback = False
cfg.arch.callbacks = []
cfg.arch.gradient_clip_val = 0.0
cfg.arch.accumulate_grad_batches = 1
cfg.arch.val_check_interval = 1.0
cfg.arch.check_val_every_n_epoch = 1
cfg.arch.num_sanity_val_steps = 0
cfg.arch.progress_bar_refresh_rate = 1
cfg.arch.fast_dev_run = False
cfg.arch.overfit_pct = 0.0
cfg.arch.train_percent_check = 1.0
cfg.arch.val_percent_check = 1.0
cfg.arch.test_percent_check = 1.0

Because of that (probably) I am having issues replicating my results, could you please advise on possible solutions? I will open-source the code as soon as I manage to replicate current results.