FudanDISC / ReForm-Eval

An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
Apache License 2.0
32 stars 4 forks source link

Discrepancy between the paper and listed benchmarks in the repo #8

Closed zhimin-z closed 8 months ago

zhimin-z commented 8 months ago

image image Flowers102 is not found in the paper...

IMNearth commented 8 months ago

Hi zhimin, Actually we have included Flowers102 in the paper.

The table you listed is Table 6 in Page 18 about the statistics of datasets on visual cognition tasks. However, Flowers102 is a dataset that used to evaluate the coarse-grained perception ability of models. Thus, this dataset is included in Table 5, Page 17. You just have to scroll up to find the information.

Just for your convenience, I cropped the table here.

截屏2024-01-25 16 21 21