Closed laion101 closed 10 months ago
Hi @laion101 , currently we need paired depth map and image as input. In general, depth/normal maps serve as auxiliary information for image perception, such as RGBD classification/detection. Therefore we use both RGB images and depth maps as input where depth maps help the model to understand RGB images.
Got it, thanks for your reply!
Thank you for your outstanding work.
I noticed that when running the demo you provided, for QA inference in the modalities of depth/normal maps, it seems essential to provide both the RGB image and the depth/normal maps together to obtain accurate answers. If only the depth/normal information is provided, the system appears unable to respond to questions.
Could you clarify whether the intended functionality of this system in the depth/normal mode aligns with the paper, which suggests that QA inference can be accomplished solely based on depth/normal information?
![Uploading WechatIMG5543.jpg…]()