Training failed? - Githubissues

PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

https://pixart-alpha.github.io/PixArt-sigma-project/

GNU Affero General Public License v3.0

1.66k stars 82 forks source link

Training failed? #142

Closed alfredplpl closed 1 month ago

alfredplpl commented 2 months ago

Thank you for your great code!

Currently, I'm trying to create a 0.6B Pixart-Sigma model from full scratch using 30M images. After training for several hundred thousand steps, cats still don't maintain their shape. What could be the cause of this? Possible reasons include insufficient data, insufficient parameters, or lack of text encoder capability. Which do you think it might be? tmp89xb0qug

Feynman1999 commented 2 months ago

Thank you for your great code!

Currently, I'm trying to create a 0.6B Pixart-Sigma model from full scratch using 30M images. After training for several hundred thousand steps, cats still don't maintain their shape. What could be the cause of this? Possible reasons include insufficient data, insufficient parameters, or lack of text encoder capability. Which do you think it might be?

Hello, may I ask which training script you are using? I have tried training from scratch before, but due to limited resources, I gave up

alfredplpl commented 2 months ago

@Feynman1999 I fixed the training code because of the limited resources. For example, I use L4 x32 for training because I do not get A100 or H100. The point is as follows:

Model: bfloat16
Optimizer: AdamW 8bit (bfloat16)
Text Encoder: Llama-based LLM 7B 8bit (bfloat16)

We can load the model on L4 which has 24 GB VRAM.

alfredplpl commented 2 months ago

I continue training the model. It seems that the cat in the inference result has a eye.

Feynman1999 commented 2 months ago

I continue training the model. It seems that the cat in the inference result has a eye.

Perhaps you can try fine-tuning the official model and training it on your dataset. I think the initial results were good, but if there were obvious cracks later on, it should be due to a bug in the training code (such as precision overflow causing gradient anomalies, data loading errors, etc.)

alfredplpl commented 2 months ago

I enhanced the precision. Then, I got the high quality images.

Model: float32
Optimizer: AdamW 8bit (bfloat16, mixed precision)
Text Encoder: Llama-based LLM 7B 8bit (bfloat16)

alfredplpl commented 1 month ago

I succeeded the training. I tell you the training in detail. Thank you.