Some confusion about the modalities of depth/normal maps.

laion101 commented 8 months ago

Thank you for your outstanding work.

I noticed that when running the demo you provided, for QA inference in the modalities of depth/normal maps, it seems essential to provide both the RGB image and the depth/normal maps together to obtain accurate answers. If only the depth/normal information is provided, the system appears unable to respond to questions.

Could you clarify whether the intended functionality of this system in the depth/normal mode aligns with the paper, which suggests that QA inference can be accomplished solely based on depth/normal information?

![Uploading WechatIMG5543.jpg…]()

laion101 commented 8 months ago

csuhan commented 8 months ago

Hi @laion101 , currently we need paired depth map and image as input. In general, depth/normal maps serve as auxiliary information for image perception, such as RGBD classification/detection. Therefore we use both RGB images and depth maps as input where depth maps help the model to understand RGB images.

laion101 commented 7 months ago

Got it, thanks for your reply！

csuhan / OneLLM

Some confusion about the modalities of depth/normal maps. #9