Questions about RGB and Depth Image Alignment

BAAI-DCAI / SpatialBot

The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.

MIT License

123 stars 9 forks source link

Questions about RGB and Depth Image Alignment #3

Closed Yuxin916 closed 1 month ago

Yuxin916 commented 1 month ago

Hi, Thank you for this impressive work.

After reading the paper and try on the demo provided. I do see there is great potential of the model in spatial relationship understanding. I want to ask about SpatialBot-Phi2-3B-RGBD, which is trained on SpatialQA, with RGB & RGBD images, and tested with RGB-Depth images. So this model is trained with RGBD input and tested also in RGBD? If so, for fugure 5 in the paper, how is the training include RGB and Depth alignment? Because for the vision processing, i noticed SigLip Encoder is used to encode both RGB and Depth image. In addition, is there a way to output RGBD features seperately to visualize?

Thank you and looking forward to your reply.

RussRobin commented 1 month ago

Hi @Yuxin916 , thank you for your interest in SpatialBot.

SpatialBot-Phi2-3B-RGBD is trained with RGBD input and tested also in RGBD? Yes. In SpatialQA, images from COCO and VG and RGB-D. The remaining images, e.g. OCR images, are RGB (It doesn't make sense to do OCR tasks with depth image).
For figure 5 in the paper, how is the training include RGB and Depth alignment? The RGB and Depth maps are fed into image encoder separately. For SigLIP, the image token size is 384*2 (two image inputs). See Sec. III A for how we encode depth values.
Is there a way to output RGBD features separately to visualize? Yes. In our quickstart: image_tensor = model.process_images([image1,image2], model.config).to(dtype=model.dtype, device=device) The RGB and depth image are encoded separately, so you can easily find the encoded feature.

Hope it helps!

Regards

Yuxin916 commented 1 month ago

Thank you for your answer. It make much clearer!

I have another problem regarding the debuging. I would like to debug and look inside the modeling_bunny_phi.py and configuration_bunny_phi.py scripts. And i notice every time it runs it will download these two files in .cache and run through them instead of the folder that contains everything in huggingface. I also change the model_name into my dir path and still not working. Do you have any suggestions? Or i need to change the config.json "auto_map": { "AutoConfig": "configuration_bunny_phi.BunnyPhiConfig", "AutoModelForCausalLM": "modeling_bunny_phi.BunnyPhiForCausalLM" }?

Thank you!

RussRobin commented 1 month ago

Just to clarify that this is hf/git issues and is not a problem with our model. If you want to locally make changes to downloaded hf files, you may want to:

copy the downloaded hf files from hf .cache to another folder, let's say: cp .../.cache .../your_folder
modify scripts in your_folder
in QuickStart codes, set model_name = '.../your_folder'
Run QuickStart, and you'll find that hf automatically does: cp .../your_folder .../.cache and run codes in .cache, so changes in '.../your_folder'` will be run

Yuxin916 commented 1 month ago

Thank you! The problem is resolved.

Best Regards