GengzeZhou / NavGPT

[AAAI 2024] Official implementation of NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
MIT License
149 stars 12 forks source link

Depth information from Matterport3D simulator #8

Closed Breezewrf closed 5 months ago

Breezewrf commented 5 months ago

Hi

Thank you for your open-source work.

I have doubt about the depth information you mentioned on your paper, which illustrates We also extract the depth information of the center pixel of the object provided by the Matterport3D simulator. The depth information can be used for preventing from we observed a phenomenon in that agents failed to reach the destination because they do not know how close they are to the destination. Once the target viewpoint is visible in sight, they tend to stop immediately.

The key of my problem is that is it a trick to directly extract depth information from Matterport3D simulator(just like Ground Truth Leaking?). I think it's not reasonable.

Thank you for your further explaination.

GengzeZhou commented 5 months ago

We introduce NavGPT as an initiative aimed at exploring the extent to which Large Language Models (LLMs) can comprehend the perception of the world and the consequences of interactions within an environment, all through textual information. Understanding depth information is crucial for developing effective navigation agents. Therefore, our objective is to evaluate whether GPT-4 possesses the reasoning capabilities necessary to utilize depth information effectively.

Integrating a depth sensor into real robots is a cost-effective solution, and it has become a standard practice in the field of visual navigation and Sim2Real transfer. This includes directly incorporating depth as input or utilizing it to construct bird’s-eye view (BEV) maps. By examining GPT-4's ability to process and reason with depth information, we aim to advance the development of sophisticated navigation systems.