Open IFeelBloated opened 2 years ago
should be
loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval * num_gpus * batch_gpu / batch_size, cur_nimg=cur_nimg)
I think
Your first inequality cannot happen according to the following check, so the learning rate isn't enlarged
https://github.com/NVlabs/stylegan3/blob/407db86e6fe432540a22515310188288687858fa/train.py#L226
Another option is to remove such a check and add your correction, but is it really necessary?
The inequality can definitely happen, suppose --gpus=1 --batch=64 --batch-gpu=16
, that sanity check will be reduced to if 64 is a multiple of 16, which is true and thus will pass the check.
if batch_gpu < batch_size // num_gpus, the accumulated gradient should be normalized by (num_gpus batch_gpu) // batch_size. The current accumulation implementation does not seem to be normalized, which effectively enlarges the learning rate by a factor of batch_size / (num_gpus batch_gpu)