Closed Yuxin916 closed 3 months ago
[
{
'id':'xxx',
'image': ["images/rgb.png", "images/depth.png"],
'conversations':[{'from':'human','value':'xxx'},{'from':'gpt','value':'xxx'},{'from':'human', ....] # we do not include these prompts in data json, they included on conv templates: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
}
]
The data format is almost the same as bunny695k, with 2 differences:
Please drop me an email (wxcai@stanford.edu) so i will send you some sample data of SpatialQA
Noted with thanks! I have sent the email. Looking forward to your reply.
Hi! Hope you are doing well :)
May i know what is the data structure of the data/finetune folder in more details? I would like to just put in very few set of sensor data I obtained first and make the finetune pipeline runnable :)
I noticed that similar to dataset in Bunny, the json file may look like this:
{ "image": "existence/sa_783586.jpg", "question": "Is there a tree at the edge of the paved area in the lower-left corner of the image?", "answer": "No", "id": "existence_sa_783586_neg" }
.Will the Spatial_QA.json look similar like this below?
[{ "image": [ "images/rgb.png", "images/depth.png"], "conversations": [ { "value": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image 1>\\n<image 2>\\n{prompt} ASSISTANT:" } ], "answer": "xxxx" }
It seems wrong for the preprocess_multimodal.
Best Regards