BAAI-DCAI / SpatialBot

The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
MIT License
123 stars 9 forks source link

Structure for Finetune Dataset #9

Closed Yuxin916 closed 2 weeks ago

Yuxin916 commented 2 weeks ago

Hi! Hope you are doing well :)

May i know what is the data structure of the data/finetune folder in more details? I would like to just put in very few set of sensor data I obtained first and make the finetune pipeline runnable :)

I noticed that similar to dataset in Bunny, the json file may look like this: { "image": "existence/sa_783586.jpg", "question": "Is there a tree at the edge of the paved area in the lower-left corner of the image?", "answer": "No", "id": "existence_sa_783586_neg" }.

Will the Spatial_QA.json look similar like this below?

[{ "image": [ "images/rgb.png", "images/depth.png"], "conversations": [ { "value": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image 1>\\n<image 2>\\n{prompt} ASSISTANT:" } ], "answer": "xxxx" }

It seems wrong for the preprocess_multimodal.

Best Regards

RussRobin commented 2 weeks ago

[ { 'id':'xxx', 'image': ["images/rgb.png", "images/depth.png"], 'conversations':[{'from':'human','value':'xxx'},{'from':'gpt','value':'xxx'},{'from':'human', ....] # we do not include these prompts in data json, they included on conv templates: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. } ]

The data format is almost the same as bunny695k, with 2 differences:

  1. 'image' is a list in SpatialBot
  2. SpatialBot use <image 1>\n<image 2>\n, while bunny use \n

Please drop me an email (wxcai@stanford.edu) so i will send you some sample data of SpatialQA

Yuxin916 commented 2 weeks ago

Noted with thanks! I have sent the email. Looking forward to your reply.