Text-to-Image Alignment Performance of the ELLA-SDXL Model

xiexiaoshinick commented 7 months ago

First of all, I would like to express my sincere gratitude for your open-source model ELLA, which is truly remarkable. I have been closely following your team's work, and as soon as the model was released, I couldn't wait to test it. I evaluated the text-to-image alignment performance of the ELLA-SD1.5 model using GenEval. Compared to the original Stable Diffusion 1.5, ELLA-SD1.5 demonstrated a 7 percentage point improvement in text-to-image alignment, and the improvement was even more significant when compared to Salesforce's diffusion-DPO method. I noticed that Stable Diffusion 3 has adopted GenEval to evaluate its text-to-image alignment performance. Therefore, I would like to inquire whether your team has plans to release the GenEval evaluation scores for ELLA-SDXL. This would enable us to compare the performance of ELLA-SDXL relative to SD3 on a unified scale.

model	Overall	single	two	counting	colors	position	color_attr
SD1.5	42.34	95.62	37.63	37.81	74.73	3.50	4.75
SD1.5-DPO	43.00	96.88	39.90	38.75	75.53	3.25	3.75
ELLA-SD1.5	49.94	94.69	55.81	36.56	77.32	14.75	20.50
SDXL	55.63	98.12	75.25	43.75	89.63	11.25	15.75
SDXL-DPO	58.02	99.38	82.58	49.06	85.11	13.50	18.50
ELLA-SDXL
DALL-E 3	67.00	96.00	87.00	47.00	83.00	43.00	45.00
SD3 best	74.00	99.00	94.00	72.00	89.00	33.00	60.00

Pauweltje commented 7 months ago

I can't wait! Thanks for the research!

DthdZK commented 6 months ago

Thanks a lot for your job and research! @xiexiaoshinick

TencentQQGYLab / ELLA

Text-to-Image Alignment Performance of the ELLA-SDXL Model #24