OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
779 stars 61 forks source link

Is there any evaluation on VQA datasets? #1

Closed jzhang38 closed 8 months ago

JunZhan2000 commented 9 months ago

We use multimodal captioning and generation to evaluate our pre-trained models, as these tasks have been set as pre-training objectives. Since our instruction dataset, AnyInstruct, does not incorporate a general VQA dataset, it has not been evaluated on VQA tasks. Our constructed AnyInstruct dataset focuses more on general dialogue with arbitrary modality combinations, to demonstrate that multiple modalities can be compatible within a single model. The corresponding capabilities are showcased in the demo at https://junzhan2000.github.io/AnyGPT.github.io/.