Closed minkyujeon closed 1 year ago
Thanks for your attention!
We tried 256x256 but not much. In my experience, the "filling gap loss" is very sensitive to the image size.
For 32x32, the loss is still stable even if we don't use proposed loss weighting schedule.
For 64x64 and 128x128, the loss will be extremely unstable if we don't use proposed loss weighting schedule. But, although the loss is unstable when no loss weighting schedule is used, the model can still slowly converge.
For 256x256, you may use a new loss weighting schedule to stabilize the loss (i.e., up-weighting for low SNR and down-weighting for high SNR) or even only optimize for the last 80% timesteps.
I think the reason is that the same noise schedule behave differently on different image resolution. See recent works On the Importance of Noise Scheduling for Diffusion Models and simple diffusion for details. Maybe we can draw inspiration from them and design a image-resolution-related function for universal loss weighting schedule.
Thank you for sharing this information !
Dear authors,
Thanks for the nice work! I just wonder if you have tried to train and test 256x256 size images.
Thanks :)