TencentQQGYLab / ELLA

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
https://ella-diffusion.github.io/
Apache License 2.0
1.1k stars 57 forks source link

The effect of training data? #10

Closed XiaoBuL closed 6 months ago

XiaoBuL commented 8 months ago

Hello, thanks for your great work.

I'm curious about the effect of training data. Did you ever directly fully fine-tune the SD-1.5 or SD-XL model on the training data?

I guess fine-tuning SD-1.5 can also benefit from the training data, e.g. the T2I-CompBench performance.

Can you report the performance of fine-tuned SD-1.5 or SD-XL on your training data?

Thanks!

fangyixiao18 commented 8 months ago

Yes, the fine-tuned SD-1.5 or SD-XL may gain some improvements with our training data. However, for the following 2 reasons:

  1. our data has some prompts that are more than 77 tokens
  2. we hope our model can be easily incorporated with community models and downstream tools.

Thus, we didn't fine-tune SD-1.5 or SD-XL.

XiaoBuL commented 8 months ago

Thanks for your reply!

You may ignore the tokens that are more than 77 to fine-tune the SD-1.5 or SD-XL.

I'm still curious about whether the lift is from the LLM or the training data.

budui commented 8 months ago

Actually, we're quite curious about it. We'll try to gather enough GPU to finetune SD1.5 ~or SDXL~

XiaoBuL commented 8 months ago

Thanks! Looking forward to your results!

budui commented 6 months ago

we fine-tuned the whole U-Net of SD v1.5 using the proposed datasets while adhering to the same training hyperparameters as employed for ELLA-SD1.5, which incorporates T5-XL and TSC. Both models underwent training for 140,000 optimization steps, corresponding to approximately one epoch:

image