Depth information from Matterport3D simulator

GengzeZhou / NavGPT

[AAAI 2024] Official implementation of NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

MIT License

149 stars 12 forks source link

Thank you for your open-source work.

I have doubt about the depth information you mentioned on your paper, which illustrates We also extract the depth information of the center pixel of the object provided by the Matterport3D simulator. The depth information can be used for preventing from we observed a phenomenon in that agents failed to reach the destination because they do not know how close they are to the destination. Once the target viewpoint is visible in sight, they tend to stop immediately.

The key of my problem is that is it a trick to directly extract depth information from Matterport3D simulator(just like Ground Truth Leaking?). I think it's not reasonable.

Thank you for your further explaination.

We introduce NavGPT as an initiative aimed at exploring the extent to which Large Language Models (LLMs) can comprehend the perception of the world and the consequences of interactions within an environment, all through textual information. Understanding depth information is crucial for developing effective navigation agents. Therefore, our objective is to evaluate whether GPT-4 possesses the reasoning capabilities necessary to utilize depth information effectively.

Integrating a depth sensor into real robots is a cost-effective solution, and it has become a standard practice in the field of visual navigation and Sim2Real transfer. This includes directly incorporating depth as input or utilizing it to construct bird’s-eye view (BEV) maps. By examining GPT-4's ability to process and reason with depth information, we aim to advance the development of sophisticated navigation systems.

GengzeZhou / NavGPT

Depth information from Matterport3D simulator #8