CERC-AAI / Robin

Apache License 2.0
62 stars 8 forks source link

Evaluation Results of Different Model Configurations #15

Open jeffhernandez1995 opened 9 months ago

jeffhernandez1995 commented 9 months ago

I've conducted additional evaluations for the various model configurations available in the Robin repository. My intention is to provide these results to the community for further insights and potential improvements.

Methodology: The evaluations were conducted using three benchmarks: MM-Vet, SEED-Benchv1, and MMBench. Below is a summary of the models evaluated along with their corresponding results using the original LlaVA evaluation scripts:

Model Name Image Model Text Model MM-Vet SEED-Benchv1 MMBench
liuhaotian/llava-v1.5-7b CLIP-ViT-L/14 336 lmsys/vicuna-7b-v1.5 31.1 58.60 64.3
liuhaotian/llava-v1-7b CLIP-ViT-L/14 336 lmsys/vicuna-7b-v1.3 28.1 33.52 59.2
liuhaotian/llava-v1.5-7b CLIP-ViT-L/14 336 meta-llama/Llama-2-7b-chat-hf 30.1 54.68 56.78
agi-collective/mistral-7b-siglip-so400m-finetune-lora SigLIP--ViT-L/14 384 mistralai/Mistral-7B-v0.1 25.7 53.33 57.47
agi-collective/mistral-7b-oh-siglip-so400m-frozen-ve-finetune-lora SigLIP--ViT-L/14 384 teknium/OpenHermes-2.5-Mistral-7B 35.8 57.39 63.8

Observations and Considerations:

I hope these results are helpful for the ongoing development and refinement of the models in the Robin repository. Your work in creating and maintaining these models is highly appreciated by the community.

Thank you for your dedication to advancing the field of AI.

Alexis-BX commented 9 months ago

Hi, Thank you very much for taking the time to perform these evaluation! We are currently working on more comprehensive evaluations for the models, as well as better models. We actually got results that differ quite a bit from the ones you show here, for instance agi-collective/mistral-7b-siglip-so400m-finetune-lora scored 30.6 on MM-Vet. Would you mind sharing your experimental setup so that we may verify ours please? Thanks!