Details about instability when training with diffusers

PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

https://pixart-alpha.github.io/PixArt-sigma-project/

GNU Affero General Public License v3.0

1.44k stars 67 forks source link

Details about instability when training with diffusers #88

Open samedii opened 1 month ago

samedii commented 1 month ago

Check your vae and scale_factor. BTW, the training with diffusers is not stable. That's why we haven't changed all the code base to diffusres.

Also https://github.com/PixArt-alpha/PixArt-sigma/issues/72

What types of issues have you seen with diffusers? Would be helpful to know what to look for.

My training has gone fine with diffusers after I made sure that the text encoder keeps some parts in fp32.

lawrence-cj commented 1 month ago

Hi there. Thanks for caring. The unstable problem is due to fp16 attention in Transformer2DModel when we train on 1024px images or larger. Do you have any insight on this one?

samedii commented 1 month ago

Thanks for the info! I've only trained from the 512x512 for iteration speed so far. I'll try 1024x1024 then.

I noticed that you do the attention calculations in fp32 in your code though, might be related.

lawrence-cj commented 1 month ago

Correct.

culeao commented 4 weeks ago

I also encountered training instability, especially when training on more tokens, have you tried training on bf16?