OpenGVLab / LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
https://openlamm.github.io/
286 stars 15 forks source link

Some benchmarks are missing from the leaderboards #64

Closed zhimin-z closed 6 months ago

zhimin-z commented 7 months ago

The claimed supportive tasks fail to show in the leaderboards... image image

zhimin-z commented 7 months ago

Are the benchmarks supported by LAMM and ChEF different by design? @shepnerd @orashi @double125 @yinzhenfei

Coach257 commented 6 months ago

Sorry for the late response. ChEF did not design any evaluation pipeline for 3D benchmark. Please refer to LAMM paper for 3D leaderboard. We will update the 3D leaderboard on the website sooner. Thanks for your feedback.

zhimin-z commented 6 months ago

Sorry for the late response. ChEF did not design any evaluation pipeline for 3D benchmark. Please refer to LAMM paper for 3D leaderboard. We will update the 3D leaderboard on the website sooner. Thanks for your feedback.

Thanks, what datasets are supported in CheF, only those appearing in the official leaderboards?

Coach257 commented 6 months ago

Yes, and you can also add other datasets for evaluation in the dataset class.

zhimin-z commented 6 months ago

Yes, and you can also add other datasets for evaluation in the dataset class.

Is there a comprehensive list of the supported benchmarks in CHeF?

Coach257 commented 6 months ago

As should in the leaderboard, the supported benchmarks are MME, SeedBench and MMBench. The supported datasets are CIFAR, Flickr, VOC, Omnibenchmark , FSC147 and ScienceQA.

zhimin-z commented 6 months ago

As should in the leaderboard, the supported benchmarks are MME, SeedBench and MMBench. The supported datasets are CIFAR, Flickr, VOC, Omnibenchmark , FSC147 and ScienceQA.

Thanks for your reply, does ChEF benchmark support MSCOCO as well? @Coach257 image

Coach257 commented 6 months ago

ChEF supports the POPE benchmark, which is built on MSCOCO dataset. You can refer to Evaluating Object Hallucination in Large Vision-Language Models for more details.