Evaluation script for open-source models

Wang-ML-Lab / multimodal-needle-in-a-haystack

Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models"

34 stars 0 forks source link

Evaluation script for open-source models #2

Open joyolee opened 1 month ago

joyolee commented 1 month ago

Dear authors,

thank you for the great work in long-context multi-model evaluation. In the code base, I only saw the code for Azure, OpenAI, Gemini, and Anthropic, could you also provide the evaluation script for public models?

Thank you!

AaronWhy commented 1 month ago

Thank you for the interests! We are going to release the code for open-source models in an upcoming update.