Closed IsakZhang closed 1 year ago
Hi, thanks for curating this awesome list! There is a recent paper that evaluates several MLLM (InstructBLIP, BLIP2 etc) on more challenging human exam questions requiring images:
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models arXiv: https://arxiv.org/pdf/2306.05179.pdf Github: https://github.com/DAMO-NLP-SG/M3Exam
Maybe you can consider adding this since it quite relates to this repo. Thanks!
Thanks for sharing! We've updated our repo.
Hi, thanks for curating this awesome list! There is a recent paper that evaluates several MLLM (InstructBLIP, BLIP2 etc) on more challenging human exam questions requiring images:
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models arXiv: https://arxiv.org/pdf/2306.05179.pdf Github: https://github.com/DAMO-NLP-SG/M3Exam
Maybe you can consider adding this since it quite relates to this repo. Thanks!