The official implementation of MapGPT. [Paper] [Project]
MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation.
Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong.
Annual Meeting of the Association for Computational Linguistics (ACL 2024).
If you have any questions, please contact me by email: jqchen(at)cs.hku.hk
Install Matterport3D simulators: follow instructions here. We use the latest version instead of v0.1.
Install requirements:
conda create -n MapGPT python=3.10
conda activate MapGPT
pip install -r requirements.txt
Prepare data:
datasets/R2R/annotations
directory. GPT key: please set your API key here.
In addition to the reported results of GPT-4v in the paper, we have also included the implementation of latest GPT-4o which is faster and cheaper.
You can run the following script where --llm
is set as gpt-4o-2024-05-13
and --response_format
is set as json
.
bash scripts/gpt4o.sh
The performance comparison between two implementations on a sampled subset is as follows. GPT-4o can achieve better NE but slightly worse SR.
LLMs | NE | OSR | SR | SPL |
---|---|---|---|---|
GPT-4v | 5.62 | 57.9 | 47.7 | 38.1 |
GPT-4o | 5.11 | 56.9 | 46.3 | 37.8 |
Note that you should modify the following part in gpt4o.sh to set the path to your observation images, the split you want to test, etc.
--root_dir ${DATA_ROOT}
--img_root /path/to/images
--split MapGPT_72_scenes_processed
--end 10 # the number of cases to be tested
--output_dir ${outdir}
--max_action_len 15
--save_pred
--stop_after 3
--llm gpt-4o-2024-05-13
--response_format json
--max_tokens 1000
@inproceedings{chen2024mapgpt, title={MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation}, author={Chen, Jiaqi and Lin, Bingqian and Xu, Ran and Chai, Zhenhua and Liang, Xiaodan and Wong, Kwan-Yee~K.}, booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics", year={2024} }