OpenGVLab / LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
https://openlamm.github.io/
286 stars 15 forks source link

what the differences among instruct_98K, instruct_140K, instruct_186K? #38

Closed peiliu0408 closed 10 months ago

peiliu0408 commented 11 months ago

as mentioned in title, in 2D_instruct folder (opendatalab), there are three merged instruct files, what the different among these files?

wangjiongw commented 11 months ago

Thanks for your attention. As mentioned in our paper, 2D part of LAMM dataset includes 4 parts, daily dialogue, detailed description, fatual knowledge dialogue and visual task dialogue, and the sample numbers are 49k, 49k, 42k, 46k, respectively. Thus, instruct_98k consists of daily dialogue and detailed description, 140k refers to 98k plus factual knowledge dialogue and 186k are the whole set of LAMM.