OpenGVLab / LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
https://openlamm.github.io/
297 stars 16 forks source link

What are the metrics of the six recipes of Desiderata? #55

Closed zhimin-z closed 10 months ago

zhimin-z commented 10 months ago

image I fail to find any of those recipes in the original paper...

zhimin-z commented 10 months ago

BTW, are those metrics all accuracy?

Coach257 commented 10 months ago

Desiderata is a newly proposed evaluation metric in ChEF, focusing on dimensions of capabilities beyond the visual abilities of MLLMs. These metrics include the trustworthiness and interactivity. Please refer to our paper ChEF for more details. Thanks for your interest.

zhimin-z commented 10 months ago

image But I still fail to find this one in terms of the exact evaluation metrics. Is the evaluation result accuracy or not?

zhimin-z commented 9 months ago

image But I still fail to find this one in terms of the exact evaluation metrics. Is the evaluation result accuracy or not?

Any update? @Coach257