Code of the CVPR 2021 Oral paper:
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
"Neo : Are you saying I have to choose whether Trinity lives or dies? The Oracle : No, you've already made the choice. Now you have to understand it." --- The Matrix Reloaded (2003).
Install the Matterport3D Simulator. Notice that this code uses the old version (v0.1) of the simulator, but you can easily change to the latest version which supports batches of agents and it is much more efficient.
Please find the versions of packages in our environment here.
Install the Pytorch-Transformers. In particular, we use this version (same as OSCAR) in our experiments.
Please follow the instructions below to prepare the data in directories:
connectivity
data
data/prevalent
img_features
Please refer to vlnbert_init.py to set up the directories.
base-no-labels
following this guide.pytorch_model.bin
from here.snap
Please read Peter Anderson's VLN paper for the R2R Navigation task.
To replicate the performance reported in our paper, load the trained network weights and run validation:
bash run/test_agent.bash
You can simply switch between the OSCAR-based and the PREVALENT-based VLN models by changing the arguments vlnbert
(oscar or prevalent) and load
(trained model paths).
To train the network from scratch, simply run:
bash run/train_agent.bash
The trained Navigator will be saved under snap/
.
If you use or discuss our Recurrent VLN-BERT, please cite our paper:
@InProceedings{Hong_2021_CVPR,
author = {Hong, Yicong and Wu, Qi and Qi, Yuankai and Rodriguez-Opazo, Cristian and Gould, Stephen},
title = {A Recurrent Vision-and-Language BERT for Navigation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {1643-1653}
}