Open samedii opened 1 month ago
Hi there. Thanks for caring. The unstable problem is due to fp16 attention in Transformer2DModel when we train on 1024px images or larger. Do you have any insight on this one?
Thanks for the info! I've only trained from the 512x512 for iteration speed so far. I'll try 1024x1024 then.
I noticed that you do the attention calculations in fp32 in your code though, might be related.
Correct.
I also encountered training instability, especially when training on more tokens, have you tried training on bf16?
Also https://github.com/PixArt-alpha/PixArt-sigma/issues/72
What types of issues have you seen with diffusers? Would be helpful to know what to look for.
My training has gone fine with diffusers after I made sure that the text encoder keeps some parts in fp32.