Closed Yuxin916 closed 1 month ago
Hi @Yuxin916 , thank you for your interest in SpatialBot.
SpatialBot-Phi2-3B-RGBD is trained with RGBD input and tested also in RGBD? Yes. In SpatialQA, images from COCO and VG and RGB-D. The remaining images, e.g. OCR images, are RGB (It doesn't make sense to do OCR tasks with depth image).
For figure 5 in the paper, how is the training include RGB and Depth alignment? The RGB and Depth maps are fed into image encoder separately. For SigLIP, the image token size is 384*2 (two image inputs). See Sec. III A for how we encode depth values.
Is there a way to output RGBD features separately to visualize?
Yes. In our quickstart:
image_tensor = model.process_images([image1,image2], model.config).to(dtype=model.dtype, device=device)
The RGB and depth image are encoded separately, so you can easily find the encoded feature.
Hope it helps!
Regards
Thank you for your answer. It make much clearer!
I have another problem regarding the debuging. I would like to debug and look inside the modeling_bunny_phi.py
and configuration_bunny_phi.py
scripts. And i notice every time it runs it will download these two files in .cache
and run through them instead of the folder that contains everything in huggingface. I also change the model_name
into my dir path and still not working. Do you have any suggestions? Or i need to change the config.json
"auto_map": { "AutoConfig": "configuration_bunny_phi.BunnyPhiConfig", "AutoModelForCausalLM": "modeling_bunny_phi.BunnyPhiForCausalLM" }
?
Thank you!
Just to clarify that this is hf/git issues and is not a problem with our model. If you want to locally make changes to downloaded hf files, you may want to:
hf .cache
to another folder, let's say: cp .../.cache .../your_folder
your_folder
model_name = '.../your_folder'
cp .../your_folder .../.cache
and run codes in .cache
, so changes in '.../your_folder'` will be runThank you! The problem is resolved.
Best Regards
Hi, Thank you for this impressive work.
After reading the paper and try on the demo provided. I do see there is great potential of the model in spatial relationship understanding. I want to ask about SpatialBot-Phi2-3B-RGBD, which is trained on SpatialQA, with RGB & RGBD images, and tested with RGB-Depth images. So this model is trained with RGBD input and tested also in RGBD? If so, for fugure 5 in the paper, how is the training include RGB and Depth alignment? Because for the vision processing, i noticed SigLip Encoder is used to encode both RGB and Depth image. In addition, is there a way to output RGBD features seperately to visualize?
Thank you and looking forward to your reply.