YangLing0818 / RealCompo

RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
https://arxiv.org/abs/2402.12908
103 stars 3 forks source link

Results of T2I-Compbench #3

Open AdventureStory opened 5 months ago

AdventureStory commented 5 months ago

Hello, I'm interested in your perfect work. And I evaluated your method on the T2I-Compbench. The results are far from what you showed in the paper. I wonder if something has gone wrong?

Here are the implementation details:

  1. First, I got the layout from GPT4.

  2. I evaluated it in the color_val.txt of T2I-Compbench, which contains 300 prompts. (using BLIP-vqa method and --np_num 8 by default)

  3. I only got. 39.84 attributes, but the result is 93 in your paper. image

Could you please offer the layout file you use for T2I-Compbench? Or could you please tell me if something is wrong?

Cominclip commented 5 months ago

Hello, I'm interested in your perfect work. And I evaluated your method on the T2I-Compbench. The results are far from what you showed in the paper. I wonder if something has gone wrong?

Here are the implementation details:

  1. First, I got the layout from GPT4.
  2. I evaluated it in the color_val.txt of T2I-Compbench, which contains 300 prompts. (using BLIP-vqa method and --np_num 8 by default)
  3. I only got. 39.84 attributes, but the result is 93 in your paper. image

Could you please offer the layout file you use for T2I-Compbench? Or could you please tell me if something is wrong?

Thank you for your interest. Can you please save the images generated by you in a cloud storage and share it with us so that we can test it again?

AdventureStory commented 5 months ago

Thanks very much! Please wait a moment.

AdventureStory commented 5 months ago

https://drive.google.com/file/d/1IEi-aQ_WkpP7SQcLeQiuOgxHE68hzh1A/view?usp=drive_link Here are generated images. Could you download and open it successfully? This zip file including some sub files:

raw_image/: including images that evaluated on color of T2I-Compbench, vis_layout/: including images with bbox layout for visualization annotation_blip/: including evaluation details of color of T2I-Compbench

Cominclip commented 5 months ago

The link indicates that I need to have access permission. I have already requested permission from you using my email. If you are unable to grant access, you can send me a compressed file to my email: sam.xchen.zhang@gmail.com

AdventureStory commented 5 months ago

I have granted the access. Thanks!

Cominclip commented 5 months ago

Thank you for your question. Due to the particularity of this benchmark, different random seeds have a significant impact on the test results for generating images. Based on the original code of T2I-Compbench, when dealing with complex prompts, ten images are required for each prompt. We believe that this approach should also be applied to simple prompts. Therefore, for these results, we conducted 10 repeated experiments for each prompt. You can change the seed for each prompt to conduct multiple experiments and obtain results.

AdventureStory commented 5 months ago

Thank you for your question. Due to the particularity of this benchmark, different random seeds have a significant impact on the test results for generating images. Based on the original code of T2I-Compbench, when dealing with complex prompts, ten images are required for each prompt. We believe that this approach should also be applied to simple prompts. Therefore, for these results, we conducted 10 repeated experiments for each prompt. You can change the seed for each prompt to conduct multiple experiments and obtain results.

Thanks for your answer! Do you mean I need to generate 10 images for each prompts using RealComp with different random seeds?

Cominclip commented 5 months ago

Thank you for your question. Due to the particularity of this benchmark, different random seeds have a significant impact on the test results for generating images. Based on the original code of T2I-Compbench, when dealing with complex prompts, ten images are required for each prompt. We believe that this approach should also be applied to simple prompts. Therefore, for these results, we conducted 10 repeated experiments for each prompt. You can change the seed for each prompt to conduct multiple experiments and obtain results.

Thanks for your answer! Do you mean I need to generate 10 images for each prompts using RealComp with different random seeds?

Yes, that's right.

alphacoder01 commented 3 weeks ago

Hi @Cominclip ,

I was testing the model on the benchmark and found the same discrepancy as @AdventureStory. I tried different seeds based on your last comment and the results are still the same. The numbers quoted in the paper for Color category is 0.774 My results are: Used GPT-4 to generate layouts.

Seed Score
0 0.451
117 0.383
393 0.348
423 0.434
486 0.391
700 0.404
717 0.360

With the average of 7 runs being 0.395 which is way off the presented number.

Maybe the authors can share their images or the approach they used to calculate these numbers