A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk,
Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox
CVPR 2020
ALFRED (Action Learning From Realistic Environments and Directives), is a new benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. Long composition rollouts with non-reversible state changes are among the phenomena we include to shrink the gap between research benchmarks and real-world applications.
For the latest updates, see: askforalfred.com
What more? Checkout ALFWorld – interactive TextWorld environments for ALFRED scenes!
Clone repo:
$ git clone https://github.com/askforalfred/alfred.git alfred
$ export ALFRED_ROOT=$(pwd)/alfred
Install requirements:
$ virtualenv -p $(which python3) --system-site-packages alfred_env # or whichever package manager you prefer
$ source alfred_env/bin/activate
$ cd $ALFRED_ROOT
$ pip install --upgrade pip
$ pip install -r requirements.txt
Download Trajectory JSONs and Resnet feats (~17GB):
$ cd $ALFRED_ROOT/data
$ sh download_data.sh json_feat
Train models:
$ cd $ALFRED_ROOT
$ python models/train/train_seq2seq.py --data data/json_feat_2.1.0 --model seq2seq_im_mask --dout exp/model:{model},name:pm_and_subgoals_01 --splits data/splits/oct21.json --gpu --batch 8 --pm_aux_loss_wt 0.1 --subgoal_aux_loss_wt 0.1
Open-source models that outperform the Seq2Seq baselines from ALFRED:
Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents
Byeonghwi Kim, Jinyeon Kim, Yuyeong Kim, Cheolhong Min, Jonghyun Choi
Paper, Code
Multi-Level Compositional Reasoning for Interactive Instruction Following
Suvaansh Bhambri, Byeonghwi Kim, Jonghyun Choi
Paper, Code
Agent with the Big Picture: Perceiving Surroundings for Interactive Instruction Following
Byeonghwi Kim, Suvaansh Bhambri, Kunal Pratap Singh, Roozbeh Mottaghi, Jonghyun Choi
Paper, Code
FILM: Following Instructions in Language with Modular Methods
So Yeon Min, Devendra Singh Chaplot, Pradeep Ravikumar, Yonatan Bisk, Ruslan Salakhutdinov
Paper, Code
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution
Valts Blukis, Chris Paxton, Dieter Fox, Animesh Garg, Yoav Artzi
Paper, Code
Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring
Yichi Zhang, Joyce Chai
Paper, Code
Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, Chen Sun
Paper, Code
MOCA: A Modular Object-Centric Approach for Interactive Instruction Following
Kunal Pratap Singh, Suvaansh Bhambri, Byeonghwi Kim*, Roozbeh Mottaghi, Jonghyun Choi
Paper, Code
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav Sukhatme
Paper, Code
Contact Mohit to add your model here.
See requirements.txt for all prerequisites
Tested on:
Run your model on test seen and unseen sets, and create an action-sequence dump of your agent:
$ cd $ALFRED_ROOT
$ python models/eval/leaderboard.py --model_path <model_path>/model.pth --model models.model.seq2seq_im_mask --data data/json_feat_2.1.0 --gpu --num_threads 5
This will create a JSON file, e.g. task_results_20191218_081448_662435.json
, inside the <model_path>
folder. Submit this JSON here: AI2 ALFRED Leaderboard. For rules and restrictions, see the getting started page.
Rules:
max_steps=1000
and max_fails=10
. Do not change these settings in the leaderboard script; these modifications will not be reflected in the evaluation server.Install Docker and NVIDIA Docker.
Modify docker_build.py and docker_run.py to your needs.
Build the image:
$ python scripts/docker_build.py
For local machines:
$ python scripts/docker_run.py
source ~/alfred_env/bin/activate
cd $ALFRED_ROOT
For headless VMs and Cloud-Instances:
$ python scripts/docker_run.py --headless
# inside docker
tmux new -s startx # start a new tmux session
# start nvidia-xconfig
sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024
# start X server on DISPLAY 0
# single X server should be sufficient for multiple instances of THOR
sudo python ~/alfred/scripts/startx.py 0 # if this throws errors e.g "(EE) Server terminated with error (1)" or "(EE) already running ..." try a display > 0
# detach from tmux shell
# Ctrl+b then d
# source env
source ~/alfred_env/bin/activate
# set DISPLAY variable to match X server
export DISPLAY=:0
# check THOR
cd $ALFRED_ROOT
python scripts/check_thor.py
###############
## (300, 300, 3)
## Everything works!!!
You might have to modify X_DISPLAY
in gen/constants.py depending on which display you use.
ALFRED can be setup on headless machines like AWS or GoogleCloud instances. The main requirement is that you have access to a GPU machine that supports OpenGL rendering. Run startx.py in a tmux shell:
# start tmux session
$ tmux new -s startx
# start X server on DISPLAY 0
# single X server should be sufficient for multiple instances of THOR
$ sudo python $ALFRED_ROOT/scripts/startx.py 0 # if this throws errors e.g "(EE) Server terminated with error (1)" or "(EE) already running ..." try a display > 0
# detach from tmux shell
# Ctrl+b then d
# set DISPLAY variable to match X server
$ export DISPLAY=:0
# check THOR
$ cd $ALFRED_ROOT
$ python scripts/check_thor.py
###############
## (300, 300, 3)
## Everything works!!!
You might have to modify X_DISPLAY
in gen/constants.py depending on which display you use.
Also, checkout this guide: Setting up THOR on Google Cloud
If you find the dataset or code useful, please cite:
@inproceedings{ALFRED20,
title ={{ALFRED: A Benchmark for Interpreting Grounded
Instructions for Everyday Tasks}},
author={Mohit Shridhar and Jesse Thomason and Daniel Gordon and Yonatan Bisk and
Winson Han and Roozbeh Mottaghi and Luke Zettlemoyer and Dieter Fox},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2020},
url = {https://arxiv.org/abs/1912.01734}
}
MIT License
14/10/2020:
Goto
subgoal evaluation. 28/10/2020:
--use_templated_goals
option to train with templated goals instead of human-annotated goal descriptions.26/10/2020:
json_feat_2.1.0.zip
). 07/04/2020:
28/03/2020:
Questions or issues? Contact askforalfred@googlegroups.com