I wanted to reproduce llama-adapter v2. However, I don't know how to collect the data.
https://github.com/OpenGVLab/LLaMA-Adapter
From the above, I understood the model(LLaMA-Adapter V2.1 multimodal) uses Image-Text-V1 during pretraining and GPT4LLM, LLaVA, and VQAv2 during fine-tuning.
But how can I get the data? Do I have to make it by myself?
I wanted to reproduce llama-adapter v2. However, I don't know how to collect the data.
https://github.com/OpenGVLab/LLaMA-Adapter From the above, I understood the model(LLaMA-Adapter V2.1 multimodal) uses Image-Text-V1 during pretraining and GPT4LLM, LLaVA, and VQAv2 during fine-tuning. But how can I get the data? Do I have to make it by myself?
I really appreciate any help you can provide.