Making sure to run the benchmark correctly

Hello,

First of all, thanks for the interesting work and the useful benchmark!

I have a new text-to-image generative model, and I want to make sure that I evaluate and compare it properly (as I aim to report it in a paper): First, I generate the 300 samples from texture_val.txt along with their index. Next, I ran the BLIP_vqa script.

I saw that running with and without the index number (caption_index.png) significantly changes the results. Should it be this way? If so, the index should be set according to the order in the txt files?

Thanks a lot!

Karine-Huang / T2I-CompBench

Making sure to run the benchmark correctly #17