EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval
https://lmms-lab.github.io/
Other
1.82k stars 142 forks source link

Adding MobileCaptureVQA to the benchmark #127

Open arnaudstiegler opened 4 months ago

arnaudstiegler commented 4 months ago

Hi team, I'd be interested to see whether we could add the MobileCaptureVQA dataset on this benchmark.

This VQA dataset focused on mobile capture (i.e. images taken from a phone), that aims at assessing models on extraction capabilities specifically for mobile capture. Contrarily to existing VQA benchmarks (DocVQA, ChartVQA), it puts the emphasis on mobile-capture-specific noise such as bad lighting, document skew, and provides a much higher variability of text in the wild (can be a receipt, a bottle of wine, food packaging, etc..). Similarly to other VQA datasets, it is meant to be purely extractive, i.e. the answer to the question is written somewhere in the image (which allows for easy scoring).

The dataset is already available on HuggingFace: https://huggingface.co/datasets/arnaudstiegler/mobile_capture_vqa It contains ~850 questions for ~120 unique images.

I'd be happy to contribute the code to add the dataset if there's any interest!

Here's one sample from the dataset (question/answers is at the top) Screenshot 2024-06-19 at 3 58 53 PM

kcz358 commented 4 months ago

Hi, feel free to contribute dataset and benchmarks into our pipeline. Once you create a PR, we will try to review your code and working on it for a merge