Evaluation Results of Different Model Configurations

I've conducted additional evaluations for the various model configurations available in the Robin repository. My intention is to provide these results to the community for further insights and potential improvements.

Methodology: The evaluations were conducted using three benchmarks: MM-Vet, SEED-Benchv1, and MMBench. Below is a summary of the models evaluated along with their corresponding results using the original LlaVA evaluation scripts:

Model Name	Image Model	Text Model	MM-Vet	SEED-Benchv1	MMBench
liuhaotian/llava-v1.5-7b	CLIP-ViT-L/14 336	lmsys/vicuna-7b-v1.5	31.1	58.60	64.3
liuhaotian/llava-v1-7b	CLIP-ViT-L/14 336	lmsys/vicuna-7b-v1.3	28.1	33.52	59.2
liuhaotian/llava-v1.5-7b	CLIP-ViT-L/14 336	meta-llama/Llama-2-7b-chat-hf	30.1	54.68	56.78
agi-collective/mistral-7b-siglip-so400m-finetune-lora	SigLIP--ViT-L/14 384	mistralai/Mistral-7B-v0.1	25.7	53.33	57.47
agi-collective/mistral-7b-oh-siglip-so400m-frozen-ve-finetune-lora	SigLIP--ViT-L/14 384	teknium/OpenHermes-2.5-Mistral-7B	35.8	57.39	63.8

Observations and Considerations:

The results vary across different benchmarks and model configurations.
It's important to note that while some numbers are lower, this does not necessarily imply inferior model performance. At this level, evaluations can be quite subjective. -It is encouraged for users and developers to interact with the models directly for a more comprehensive understanding of their capabilities and characteristics.
Further exploration and fine-tuning might be beneficial for certain model configurations.
Community feedback on these configurations can be valuable for future improvements.

I hope these results are helpful for the ongoing development and refinement of the models in the Robin repository. Your work in creating and maintaining these models is highly appreciated by the community.

Thank you for your dedication to advancing the field of AI.

CERC-AAI / Robin

Evaluation Results of Different Model Configurations #15