Incorrect gradient accumulation?

NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3

Other

6.3k stars 1.11k forks source link

Incorrect gradient accumulation? #184

Open IFeelBloated opened 2 years ago

IFeelBloated commented 2 years ago

if batch_gpu < batch_size // num_gpus, the accumulated gradient should be normalized by (num_gpus batch_gpu) // batch_size. The current accumulation implementation does not seem to be normalized, which effectively enlarges the learning rate by a factor of batch_size / (num_gpus batch_gpu)

IFeelBloated commented 2 years ago

https://github.com/NVlabs/stylegan3/blob/407db86e6fe432540a22515310188288687858fa/training/training_loop.py#L278

should be

loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval * num_gpus * batch_gpu / batch_size, cur_nimg=cur_nimg)

I think

PDillis commented 2 years ago

Your first inequality cannot happen according to the following check, so the learning rate isn't enlarged

https://github.com/NVlabs/stylegan3/blob/407db86e6fe432540a22515310188288687858fa/train.py#L226

Another option is to remove such a check and add your correction, but is it really necessary?

IFeelBloated commented 2 years ago

The inequality can definitely happen, suppose --gpus=1 --batch=64 --batch-gpu=16, that sanity check will be reduced to if 64 is a multiple of 16, which is true and thus will pass the check.