learning rate for train

Sirius-Li commented 4 months ago

I find the learning rate in the train script is 1.5e-4, different from "4e−5 for U-Net and 3e−5 for the text encoder" in paper. I also try some different learning rates, finding that lr between 5e-5 and 1.5e-4 is ok, the loss plots are similar, what should the learning rate set?

KaiChen1998 commented 4 months ago

For 512x512 and 800x456 experiments, the default LR for UNet and text encoder is 4e-5 and 3e-5 respectively with a batch size of 64.
For 256x256 experiments, due to the decreased GPU memory requirement, we can use a batch size of 32x8=256 with 8 V100 GPUs, and the linearly-scaled LR is (4e-5)x(32x8/64)=1.6e-4 for UNet.
However, due to the large batch size optimization, we find LR=1.5e-4 achieves a little bit better empirical results with 8 GPUs.
For training GeoDiffusion, the --learning_rate argument controls the LR for UNet, while the --lr_text_ratio argument determines the ratio between LRs of UNet and text encoder.
For example, --learning_rate=4e-5 and --lr_text_ratio=0.75 suggest the LR for UNet is 4e-5, while the LR for text encoder is 4e-5x0.75=3e-5.

Sirius-Li commented 4 months ago

Alright, thanks a lot for your help!!!

KaiChen1998 / GeoDiffusion

learning rate for train #17