TencentQQGYLab / ELLA

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
https://ella-diffusion.github.io/
Apache License 2.0
1.1k stars 57 forks source link

Text-to-Image Alignment Performance of the ELLA-SDXL Model #24

Open xiexiaoshinick opened 7 months ago

xiexiaoshinick commented 7 months ago

First of all, I would like to express my sincere gratitude for your open-source model ELLA, which is truly remarkable. I have been closely following your team's work, and as soon as the model was released, I couldn't wait to test it. I evaluated the text-to-image alignment performance of the ELLA-SD1.5 model using GenEval. Compared to the original Stable Diffusion 1.5, ELLA-SD1.5 demonstrated a 7 percentage point improvement in text-to-image alignment, and the improvement was even more significant when compared to Salesforce's diffusion-DPO method. I noticed that Stable Diffusion 3 has adopted GenEval to evaluate its text-to-image alignment performance. Therefore, I would like to inquire whether your team has plans to release the GenEval evaluation scores for ELLA-SDXL. This would enable us to compare the performance of ELLA-SDXL relative to SD3 on a unified scale.

model Overall single two counting colors position color_attr
SD1.5 42.34 95.62 37.63 37.81 74.73 3.50 4.75
SD1.5-DPO 43.00 96.88 39.90 38.75 75.53 3.25 3.75
ELLA-SD1.5 49.94 94.69 55.81 36.56 77.32 14.75 20.50
SDXL 55.63 98.12 75.25 43.75 89.63 11.25 15.75
SDXL-DPO 58.02 99.38 82.58 49.06 85.11 13.50 18.50
ELLA-SDXL
DALL-E 3 67.00 96.00 87.00 47.00 83.00 43.00 45.00
SD3 best 74.00 99.00 94.00 72.00 89.00 33.00 60.00
Pauweltje commented 7 months ago

I can't wait! Thanks for the research!

DthdZK commented 6 months ago

Thanks a lot for your job and research! @xiexiaoshinick