Open jeffhernandez1995 opened 9 months ago
Hi, Thank you very much for taking the time to perform these evaluation! We are currently working on more comprehensive evaluations for the models, as well as better models. We actually got results that differ quite a bit from the ones you show here, for instance agi-collective/mistral-7b-siglip-so400m-finetune-lora scored 30.6 on MM-Vet. Would you mind sharing your experimental setup so that we may verify ours please? Thanks!
I've conducted additional evaluations for the various model configurations available in the Robin repository. My intention is to provide these results to the community for further insights and potential improvements.
Methodology: The evaluations were conducted using three benchmarks: MM-Vet, SEED-Benchv1, and MMBench. Below is a summary of the models evaluated along with their corresponding results using the original LlaVA evaluation scripts:
Observations and Considerations:
I hope these results are helpful for the ongoing development and refinement of the models in the Robin repository. Your work in creating and maintaining these models is highly appreciated by the community.
Thank you for your dedication to advancing the field of AI.