Karine-Huang / T2I-CompBench

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
https://arxiv.org/pdf/2307.06350.pdf
MIT License
168 stars 5 forks source link

BLIP VQAeval default 8 noun phrases #11

Closed pbevan1 closed 7 months ago

pbevan1 commented 7 months ago

I noticed the default number of noun phrases is 8 for the BLIP eval, but from 2 onwards all/most of the questions are empty strings.. So the final calculation is averaging across a lot of invalid responses? Am I correct here or am I missing something? Shouldn't this default to 2?

Karine-Huang commented 7 months ago

Hello! Please refer to #9 for the explanation of number of noun phrases. If the question is an empty string, the BLIP score would be set to 1 (L#99-100 in BLIPvqa_eval /BLIP_vqa.py). The final calculation involves the multiplication of BLIP scores from different noun phrases, and empty strings would not affect the final score as they are computed as a multiplication of 1.

pbevan1 commented 7 months ago

Great, that explains it, thanks. Sorry I forgot to check the closed issues!