training data format - Githubissues

Hi, Your work is really impressive. After reading your paper, I got some questions, seems like you build your own dataset to train the VLM and you used kind of CoT method to do that. Usually, training vlm would have the format looks like below: "{ "id": "32357477_ijms-21-03049-f006", "image": "32357477_ijms-21-03049-f006.jpg", "conversatons": [ { "from": "human", "value": "Illustrate the image through a descriptive explanation\n" }, { "from": "gpt", "value": "Determination of metaphase II entry in oocytes pre-exposed to ferrocenyl 4-(alkylamino)-1,4-dihydroquinolines. After incubation or not with compounds 9, 6, 10, 7, 11, 8 for 24 h, oocytes were rinsed four times in ND96 for 30 min, before progesterone stimulation. White spot appearance was scored after 15 h. N refers to the number of females and n to the number of oocytes (N = 2 and n = 60)." } ] },"""

So I am curious how does your format looks like, and you seperate your answers into 3 parts, do you train them one by one or as a whole part?

Really appreciate if you could give me some hits. Thank you so much .

Best,

Tsinghua-MARS-Lab / DriveVLM

training data format #2