Karine-Huang / T2I-CompBench

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
https://arxiv.org/pdf/2307.06350.pdf
MIT License
159 stars 5 forks source link

How to use this benchmark to evaluate other models, such as SDXL and SD3-medium? #18

Open DthdZK opened 2 weeks ago

Karine-Huang commented 1 week ago

To use this benchmark to evaluate other models, such as SDXL and SD3-medium, follow these steps:

  1. Generate Images:
color/samples/
    ├── a green bench and a blue bowl_000000.png
    ├── a green bench and a blue bowl_000001.png
    └──...
  1. Evaluation:

I hope this helps! Let me know if you need further assistance.

YuehengLuo commented 6 days ago

Hi, I would like to know how to evaluate various metrics if I want to use a generative model like sd1.5 should I use color_val.txt to generate 3000 images and then use bash BLIPvqa_eval/test.sh to get a score that is Attribute color? And then the test Attribute Shape has to be generated using Shape_val.txt? I mean when I want to reproduce the corresponding metrics, I should use the corresponding val.txt to generate the test image, right?

Karine-Huang commented 5 days ago

Yes, you are correct. To evaluate various metrics for a generative models, use the corresponding val.txt files to generate the test images for each category.

YuehengLuo commented 5 days ago

Thank you. I get it!