π₯Winner of the RxR-Habitat Challenge in CVPR 2022. [Challenge Report] [Challenge Certificate]
This work tackles a practical yet challenging VLN setting - vision-language navigation in continuous environments (VLN-CE). To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav performs online topological mapping of environments by self-organizing predicted waypoints along a traversed path, without prior environmental experience. It privileges the agent to break down the navigation procedure into high-level planning and low-level control. Concurrently, ETPNav utilizes a transformer-based cross-modal planner to generate navigation plans based on topological maps and instructions. The plan is then performed through an obstacle-avoiding controller that leverages a trial-and-error heuristic to prevent navigation from getting stuck in obstacles. Experimental results demonstrate the effectiveness of the proposed method. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets, respectively.
Leadboard:
Follow the Habitat Installation Guide to install habitat-lab
and habitat-sim
. We use version v0.1.7
in our experiments, same as in the VLN-CE, please refer to the VLN-CE page for more details. In brief:
Create a virtual environment. We develop this project with Python 3.6.
conda env create -f environment.yaml
Install habitat-sim
for a machine with multiple GPUs or without an attached display (i.e. a cluster):
conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless
Clone this repository and install all requirements for habitat-lab
, VLN-CE and our experiments. Note that we specify gym==0.21.0
because its latest version is not compatible with habitat-lab-v0.1.7
.
git clone git@github.com:MarSaKi/ETPNav.git
cd ETPNav
python -m pip install -r requirements.txt
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
Clone a stable habitat-lab
version from the github repository and install. The command below will install the core of Habitat Lab as well as the habitat_baselines.
git clone --branch v0.1.7 git@github.com:facebookresearch/habitat-lab.git
cd habitat-lab
python setup.py develop --all # install habitat and habitat_baselines
Instructions copied from VLN-CE:
Matterport3D (MP3D) scene reconstructions are used. The official Matterport3D download script (download_mp.py
) can be accessed by following the instructions on their project webpage. The scene data can then be downloaded:
# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/
Extract such that it has the form scene_datasets/mp3d/{scene}/{scene}.glb
. There should be 90 scenes. Place the scene_datasets
folder in data/
.
Waypoint Predictor: data/wp_pred/check_cwp_bestdist*
Processed data, pre-trained weight, fine-tuned weight [link].
unzip etp_ckpt.zip # file/fold structure has been organized
overall, files and folds are organized as follows:
ETPNav
βββ data
β βββ datasets
β βββ logs
β βββ scene_datasets
β βββ wp_pred
βββ pretrained
βββ ETP
Pre-training
Download the pretraining datasets [link] (the same one used in DUET) and precomputed features [link], unzip in folder pretrain_src
CUDA_VISIBLE_DEVICES=0,1 bash pretrain_src/run_pt/run_r2r.bash 2333
Finetuning and Evaluation
Use main.bash
for Training/Evaluation/Inference with a single GPU or with multiple GPUs on a single node.
Simply adjust the arguments of the bash scripts:
# for R2R-CE
CUDA_VISIBLE_DEVICES=0,1 bash run_r2r/main.bash train 2333 # training
CUDA_VISIBLE_DEVICES=0,1 bash run_r2r/main.bash eval 2333 # evaluation
CUDA_VISIBLE_DEVICES=0,1 bash run_r2r/main.bash inter 2333 # inference
# for RxR-CE
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_rxr/main.bash train 2333 # training
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_rxr/main.bash eval 2333 # evaluation
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_rxr/main.bash inter 2333 # inference
Our implementations are partially inspired by CWP, Sim2Sim and DUET.
Thanks for their great works!
If you find this repository is useful, please consider citing our paper:
@article{an2024etpnav,
title={ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments},
author={An, Dong and Wang, Hanqing and Wang, Wenguan and Wang, Zun and Huang, Yan and He, Keji and Wang, Liang},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024}
}