AILab-CVC / SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
Other
315 stars 12 forks source link

Easy way to probe result examples? #11

Open chancharikmitra opened 1 year ago

chancharikmitra commented 1 year ago

Hello! This is some really interesting work!

This is more of a question. Do you have any detailed analysis of the results just for InstructBLIP and InstructBLIP Vicuna? It would be great if you maybe had some of these results available for the models I mentioned. That way I could look at just the dataset rather than having to rerun the models on the benchmark. I just wanted to probe the success and failure cases in a bit more detail (seeing the example, model response, etc.).

Thanks!

Bohao-Lee commented 11 months ago

Thank you for your interest in our work, and we apologize for the delayed response. We have released the GPT-4V evaluation results for SEED-Bench-1 and SEED-Bench-2, which can be found at GPT-4V for SEED-Bench-1 and GPT-4V for SEED-Bench-2. If you're interested, please feel free to explore these results.