Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
First, I really appreciate for your great contributions in LVLM field.
Do you have any plan to release the visual commonsense reasoning (VCR) evaluation code?
There's some elaboration about how to properly locate and download the dataset, but I couldn't find the corresponding code.
First, I really appreciate for your great contributions in LVLM field.
Do you have any plan to release the visual commonsense reasoning (VCR) evaluation code? There's some elaboration about how to properly locate and download the dataset, but I couldn't find the corresponding code.
Thanks again for your work.