We (ByteDance + Huazhong University of Science and Technology) have recently constructed a benchmark(MTVQA) to measure the multilingual text comprehension ability of LMMs, including 9 widely-used but low-resource languages, which demonstrates that there is still a large room for LMMs to improve the ability of multilingual text perception and comprehension. MTVQA bench wishes to elevate the multimodal research community's attention to a wider range of visual texts. Would it be possible to add MTVQA to your collection?
MTVQA: https://github.com/bytedance/MTVQA
We (ByteDance + Huazhong University of Science and Technology) have recently constructed a benchmark(MTVQA) to measure the multilingual text comprehension ability of LMMs, including 9 widely-used but low-resource languages, which demonstrates that there is still a large room for LMMs to improve the ability of multilingual text perception and comprehension. MTVQA bench wishes to elevate the multimodal research community's attention to a wider range of visual texts. Would it be possible to add MTVQA to your collection?
MTVQA: https://github.com/bytedance/MTVQA