Added a new text-to-image modality, supporting models such as stable-diffusion and Chameleon. Benchmarks now include support for ImageRewardDB and HPSv2.
Optimize benchmarks to meet the criteria of lm-eval and lmms-eval metrics, including datasets such as MMLU, CMMLU, Belebele, GSM8K, MME, MMBench, and MM-Vet.
Fix some bugs for evaluation.
Types of changes
What types of changes does your code introduce? Put an x in all the boxes that apply:
[x] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds core functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)
Checklist
Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!
Description
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply. If you are unsure about any of these, don't hesitate to ask. We are here to help!