PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
https://pixart-alpha.github.io/PixArt-sigma-project/
GNU Affero General Public License v3.0
1.44k stars 67 forks source link

add_noise training details #91

Open kelisiya opened 1 month ago

kelisiya commented 1 month ago

I noticed that when training the sigma, you add noise using q_sample, which differs from the training in Diffusion Models with SDXL. When I switched to the add_noise method like DDPMScheduler noise_scheduler.add_noise(x_start, noise, t), the model significantly degraded. I would like to know the advantages of the q_sample method.

kelisiya commented 1 month ago

Furthermore, I also noticed that the parameters beta_start and beta_end are different. If I want to fine-tune the hyperparameters used SDXL, would it be more challenging?

kelisiya commented 1 month ago

To further elaborate, when I fully aligned SDXL for fine-tuning, I observed a decrease in image quality. I would like to know if this quality issue can be resolved after training.

A photo of beautiful mountain with realistic sunset and blue lake, highly detailed, masterpiece

lawrence-cj commented 1 month ago

The noise schedules of SDXL and Transformer-based methods(PixArt, DiT) are different. The reason for such a difference is due to the benefit of zero SNR. Refer to: https://arxiv.org/pdf/2301.10972

Besides, the changing of noise schedule may take some training time for converging. I haven't tested it on my own. If you have any results. Feel free to contact me. I'm looking forward to hear from you.

kelisiya commented 1 month ago

dog,realistic,8k,hdr Self-portrait oil painting, a beautiful cyborg with golden hair, realistic,8k,hdr Self-portrait oil painting, a beautiful cyborg with golden hair, 8k It seems that the training has achieved some results at present, but I am not sure when I will get the results I am satisfied with.