Closed Sirius-Li closed 4 months ago
512x512
and 800x456
experiments, the default LR for UNet and text encoder is 4e-5 and 3e-5 respectively with a batch size of 64.256x256
experiments, due to the decreased GPU memory requirement, we can use a batch size of 32x8=256 with 8 V100 GPUs, and the linearly-scaled LR is (4e-5)x(32x8/64)=1.6e-4
for UNet.LR=1.5e-4
achieves a little bit better empirical results with 8 GPUs.--learning_rate
argument controls the LR for UNet, while the --lr_text_ratio
argument determines the ratio between LRs of UNet and text encoder.--learning_rate=4e-5
and --lr_text_ratio=0.75
suggest the LR for UNet is 4e-5, while the LR for text encoder is 4e-5x0.75=3e-5.Alright, thanks a lot for your help!!!
I find the learning rate in the train script is 1.5e-4, different from "4e−5 for U-Net and 3e−5 for the text encoder" in paper. I also try some different learning rates, finding that lr between 5e-5 and 1.5e-4 is ok, the loss plots are similar, what should the learning rate set?