OpenGVLab / Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
453 stars 35 forks source link

LLaVA evaluation on Flickr30k #12

Open devaansh100 opened 11 months ago

devaansh100 commented 11 months ago

Hello, thanks for the great work! I was looking at this script for llava evaluation on Flickr30k, but am facing some issues, detailed here.

Could you please help me with the exact generation settings and model checkpoint used for this evaluation? Thanks!