CelebA-HQ 256x256 Training

The training is done on CelebA-HQ 256x256 pre-processed as per the instructions given in the NVAE repository.
This implementation for the DDGAN uses 4 NVIDIA GTX 1080 TI GPUs with a total batch size of 32 for training the CelebA-HQ 256x256 dataset
(--batch_size 8 and --num_process_per_node 4)
I am using the following command for training: !python3 train_ddgan.py --dataset celeba_256 --image_size 256 --exp ddgan_celebahq_exp1 --num_channels 3 --num_channels_dae 64 --ch_mult 1 1 2 2 4 4 --num_timesteps 2 --num_res_blocks 2 --batch_size 8 --num_epoch 800 --ngf 64 --embedding_type positional --use_ema --r1_gamma 2. --z_emb_dim 256 --lr_d 1e-4 --lr_g 2e-4 --lazy_reg 10 --num_process_per_node 4 --save_content
I am getting the following output. I humbly request you to guide me.
Node rank 0, local proc 0, global proc 0
Node rank 0, local proc 1, global proc 1
Node rank 0, local proc 2, global proc 2
Node rank 0, local proc 3, global proc 3
<memory at 0x7f5d253c7040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c8040>
<memory at 0x7f5d253c7040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c7040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c7040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c7040>
<memory at 0x7f5d253c8040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c7040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c8040><memory at 0x7f5d253c9040>

<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c7040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c8040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c7040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c9040><memory at 0x7f5d253c8040>

<memory at 0x7f5d253c1040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c8040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c1040>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c1040><memory at 0x7f5d253c8040>

<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c8040>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253db100>
<memory at 0x7f5d22395040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253d6100>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d22395040>
<memory at 0x7f5d253c9040>
<memory at 0x7f5d22394040>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253db100>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d22395040>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253c4040><memory at 0x7f5d253c4040>

<memory at 0x7f5d22394040>
<memory at 0x7f5d253d6100>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d253d6100>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253d2100><memory at 0x7f5d253d8100>

<memory at 0x7f5d253db100>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d253d6100>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d253c3040><memory at 0x7f5d2239f100>

<memory at 0x7f5d253d8100>
<memory at 0x7f5d253d6100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253d6100><memory at 0x7f5d253db100>

<memory at 0x7f5d253c4040>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253d6100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d2239e100><memory at 0x7f5d2239f100>

<memory at 0x7f5d253d8100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253d6100>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253d8100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253c3040>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253c4040><memory at 0x7f5d253d2100>

<memory at 0x7f5d2239e100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d2239f100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d253c4040>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253d3100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253db100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d253ce100><memory at 0x7f5d2239e100>

<memory at 0x7f5d2239e100>
<memory at 0x7f5d253ce100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d2239e100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253d2100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
<memory at 0x7f5d253cd100>
Process Process-3:
Traceback (most recent call last):
  File "/usr/local/apps/python-3.8.3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/apps/python-3.8.3/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "train_ddgan.py", line 482, in init_processes
    fn(rank, gpu, args)
  File "train_ddgan.py", line 390, in train
    x_0_predict = netG(x_tp1.detach(), t, latent_z)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/ncsnpp_generator_adagn.py", line 367, in forward
    h = modules[m_idx](torch.cat([h, hs.pop()], dim=1), temb, zemb)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 279, in forward
    h = self.act(self.GroupNorm_0(x, zemb))
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 60, in forward
    out = self.norm(input)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 245, in forward
    return F.group_norm(
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 2111, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps,
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 2; 10.92 GiB total capacity; 9.71 GiB already allocated; 151.50 MiB free; 10.06 GiB reserved in total by PyTorch)
Process Process-4:
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/apps/python-3.8.3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/apps/python-3.8.3/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "train_ddgan.py", line 482, in init_processes
    fn(rank, gpu, args)
  File "train_ddgan.py", line 390, in train
    x_0_predict = netG(x_tp1.detach(), t, latent_z)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/ncsnpp_generator_adagn.py", line 367, in forward
    h = modules[m_idx](torch.cat([h, hs.pop()], dim=1), temb, zemb)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 279, in forward
    h = self.act(self.GroupNorm_0(x, zemb))
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 60, in forward
    out = self.norm(input)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 245, in forward
    return F.group_norm(
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 2111, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps,
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 3; 10.92 GiB total capacity; 9.71 GiB already allocated; 151.50 MiB free; 10.06 GiB reserved in total by PyTorch)
Process Process-2:
Traceback (most recent call last):
  File "/usr/local/apps/python-3.8.3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/apps/python-3.8.3/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "train_ddgan.py", line 482, in init_processes
    fn(rank, gpu, args)
  File "train_ddgan.py", line 390, in train
    x_0_predict = netG(x_tp1.detach(), t, latent_z)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/ncsnpp_generator_adagn.py", line 367, in forward
    h = modules[m_idx](torch.cat([h, hs.pop()], dim=1), temb, zemb)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 279, in forward
    h = self.act(self.GroupNorm_0(x, zemb))
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 60, in forward
    out = self.norm(input)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 245, in forward
    return F.group_norm(
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 2111, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps,
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.92 GiB total capacity; 9.71 GiB already allocated; 151.50 MiB free; 10.06 GiB reserved in total by PyTorch)
Traceback (most recent call last):
  File "/usr/local/apps/python-3.8.3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/apps/python-3.8.3/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "train_ddgan.py", line 482, in init_processes
    fn(rank, gpu, args)
  File "train_ddgan.py", line 390, in train
    x_0_predict = netG(x_tp1.detach(), t, latent_z)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/ncsnpp_generator_adagn.py", line 367, in forward
    h = modules[m_idx](torch.cat([h, hs.pop()], dim=1), temb, zemb)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 279, in forward
    h = self.act(self.GroupNorm_0(x, zemb))
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 60, in forward
    out = self.norm(input)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 245, in forward
    return F.group_norm(
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 2111, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps,
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 1; 10.92 GiB total capacity; 9.71 GiB already allocated; 151.50 MiB free; 10.06 GiB reserved in total by PyTorch)
NVlabs / denoising-diffusion-gan

CelebA-HQ 256x256 Training #23