alexa / teach

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.
135 stars 27 forks source link

TEACh

Task-driven Embodied Agents that Chat

Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment. The code and model weights are licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE). Please include appropriate licensing and attribution when using our data and code, and please cite our paper.

Citation:

@inproceedings{teach,
  title={{TEACh: Task-driven Embodied Agents that Chat}},
  author={Padmakumar, Aishwarya and Thomason, Jesse and Shrivastava, Ayush and Lange, Patrick and Narayan-Chen, Anjali and Gella, Spandana and Piramuthu, Robinson and Tur, Gokhan and Hakkani-Tur, Dilek},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={2},
  pages={2017--2025},
  year={2022}
}

As of 09/07/2022, the dataset has been updated to include dialog acts annotated in the paper

Dialog Acts for Task-Driven Embodied Agents

Spandana Gella, Aishwarya Padmakumar, Patrick Lange, Dilek Hakkani-Tur

If using the dialog acts in your work, please cite the following paper:

@inproceedings{teachda,
  title={{Dialog Acts for Task-Driven Embodied Agents}},
  author={Gella, Spandana and Padmakumar, Aishwarya and Lange, Patrick and Hakkani-Tur, Dilek},
  booktitle={Proceedings of the 23nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDial)},
  year={2022},
  pages={111-123}
}

Interactions in the games, EDH instances and TfD instances that are utterances now have an additional field da_metadata containing the dialog act annotations. See the data exploration notebook for sample code to view dialog acts.

Prerequisites

Installation

pip install -r requirements.txt
pip install -e .

Downloading the dataset

Run the following script:

teach_download 

This will download and extract the archive files (experiment_games.tar.gz, all_games.tar.gz, images_and_states.tar.gz, edh_instances.tar.gz & tfd_instances.tar.gz) in the default directory (/tmp/teach-dataset).
Optional arguments:

File changes (12/28/2022): We have modified EDH instances so that the state changes checked for to evaluate success are only those that contribute towards task success in the main task of the gameplay session the EDH instance is created from. We have removed EDH instances that had no state changes meeting these requirements. Additionally, two game files, and their corresponding EDH and TfD instances were deleted from the valid_unseen split due to issues in the game files. Version 3 of our paper on Arxiv, which will be public on Dec 30, 2022 contains the updated dataset size and experimental results.

Remote Server Setup

If running on a remote server without a display, the following setup will be needed to run episode replay, model inference of any model training that invokes the simulator (student forcing / RL).

Start an X-server

tmux
sudo python ./bin/startx.py

Exit the tmux session (CTRL+B, D). Any other commands should be run in the main terminal / different sessions.

Replaying episodes

Most users should not need to do this since we provide this output in images_and_states.tar.gz.

The following steps can be used to read a .json file of a gameplay session, play it in the AI2-THOR simulator, and at each time step save egocentric observations of the Commander and Driver (Follower in the paper). It also saves the target object panel and mask seen by the Commander, and the difference between current and initial state.

Replaying a single episode locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_fn /path/to/game/file \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--status-out-fn /path/to/desired/output/status/file.json

Note that --status-out-fn must end in .json Also note that the script will by default not replay sessions for which an output subdirectory already exists under --write-frames-dir Additionally, if the file passed to --status-out-fn already exists, the script will try to resume files not marked as replayed in that file. It will error out if there is a mismatch between the status file and output directories on which sessions have been previously played. It is recommended to use a new --write-frames-dir and new --status-out-fn for additional runs that are not intended to resume from a previous one.

Replay all episodes in a folder locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_dir /path/to/dir/containing/.game.json/files \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--num_processes 50 \
--status-out-fn /path/to/desired/output/status/file.json

To generate a video, additionally specify --create_video. Note that for images to be saved, --write_images must be specified and --write-frames-dir must be provided. For state changes to be saved, --write_states must be specified and --write_frames_dir must be provided.

Evaluation

We include sample scripts for inference and calculation of metrics. teach_inference and teach_eval. teach_inference is a wrapper that implements loading EDH instance, interacting with the simulator as well as writing the game file and predicted action sequence as JSON files after each inference run. It dynamically loads the model based on the --model_module and --model_class arguments. Your model has to implement teach.inference.teach_model.TeachModel. See teach.inference.sample_model.SampleModel for an example implementation which takes random actions at every time step.

After running teach_inference, you use teach_eval to compute the metrics based output data produced by teach_inference.

Sample run:

export DATA_DIR=/path/to/data/with/games/and/edh_instances/as/subdirs (Default in Downloading is /tmp/teach-dataset)
export OUTPUT_DIR=/path/to/output/folder/for/split
export METRICS_FILE=/path/to/output/metrics/file_without_extension

teach_inference \
    --data_dir $DATA_DIR \
    --output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE \
    --model_module teach.inference.sample_model \
    --model_class SampleModel

teach_eval \
    --data_dir $DATA_DIR \
    --inference_output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE

To run TfD inference instead of EDH inference add --benchmark tfd to the inference command.

TEACh Benchmark Challenge

For participation in the challenge, you will need to submit a docker image container your code and model. Docker containers using your image will serve your model as HTTP API following the [TEACh API Specification](#TEACh API Specification). For your convenience, we included the teach_api command which implements this API and is compatible with models implementing teach.inference.teach_model.TeachModel also used by teach_inference.

We have also included two sample Docker images using teach.inference.sample_model.SampleModel and teach.inference.et_model.ETModel respectively in docker/.

When evaluating a submissions, the submitted container will be started with access to a single GPU and no internet access. For details see Step 3 - Start your container.

The main evaluation code invoking your submission will also be run as Docker container. It reuses the teach_inference CLI command together with teach.inference.remote_model.RemoteModel to call the HTTP API running in your container. For details on how to start it locally see Step 4 - Start the evaluation.

Please note that TfD inference is not currently supported via Docker image.

Testing Locally

Assuming you have downloaded the data to /home/ubuntu/teach-dataset and followed Prerequisites and Remote Server Setup.

Step 0 - Setup Environment

export HOST_DATA_DIR=/home/ubuntu/teach-dataset
export HOST_IMAGES_DIR=/home/ubuntu/images
export HOST_OUTPUT_DIR=/home/ubuntu/output
export API_PORT=5000
export SUBMISSION_PK=168888
export INFERENCE_GPUS='"device=0"'
export API_GPUS='"device=1"'
export SPLIT=valid_seen
export DOCKER_NETWORK=no-internet

mkdir -p $HOST_IMAGES_DIR $HOST_OUTPUT_DIR
docker network create --driver=bridge --internal $DOCKER_NETWORK

Note: If you run on a machine that only has a single GPU, set API_GPUS='"device=0"'.

Step 1 - Build the remote-inference-runner container

docker build -t remote-inference-runner -f docker/Dockerfile.RemoteInferenceRunner .

Step 2 - Build your container

Note: When customizing the images for your own usage, do not edit the following or your submission will fail:

For the SampleModel example, the corresponding command is:

docker build -t teach-model-api-samplemodel -f docker/Dockerfile.TEAChAPI-SampleModel .

For the baseline models, follow the corresponding command replacing MODEL_VARIANT=et with the desired variant e.g. et_plus_a.

mkdir -p ./models
mv $HOST_DATA_DIR/baseline_models ./models/
mv $HOST_DATA_DIR/et_pretrained_models ./models/
docker build --build-arg MODEL_VARIANT=et -t teach-model-api-etmodel -f docker/Dockerfile.TEAChAPI-ETModel .

Step 3 - Start your container

For the SampleModel example, the corresponding command is:

docker run -d --rm \
    --gpus $API_GPUS \
    --name TeachModelAPI \
    --network $DOCKER_NETWORK \
    -e SPLIT=$SPLIT \
    -v $HOST_DATA_DIR:/data:ro \
    -v $HOST_IMAGES_DIR/$SUBMISSION_PK:/images:ro \
    -t teach-model-api-samplemodel    

For the baseline models, just replace the image name e.g. if you followed the commands above

docker run -d --rm \
    --gpus $API_GPUS \
    --name TeachModelAPI \
    --network $DOCKER_NETWORK \
    -e SPLIT=$SPLIT \
    -v $HOST_DATA_DIR:/data:ro \
    -v $HOST_IMAGES_DIR/$SUBMISSION_PK:/images:ro \
    -t teach-model-api-etmodel    

Verify the API is running with

docker exec TeachModelAPI curl @TeachModelAPI:5000/ping

Output:
{"action":"Look Up","obj_relative_coord":[0.1,0.2]}

Step 4 - Start the evaluation

docker run --rm \
    --privileged \
    -e DISPLAY=:0 \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    --name RemoteInferenceRunner \
    --network $DOCKER_NETWORK \
    --gpus $INFERENCE_GPUS \
    -v /tmp/.X11-unix:/tmp/.X11-unix:ro \
    -v $HOST_DATA_DIR:/data:ro \
    -v $HOST_IMAGES_DIR/$SUBMISSION_PK:/images \
    -v $HOST_OUTPUT_DIR/$SUBMISSION_PK:/output \
    remote-inference-runner teach_inference \
        --data_dir /data \
        --output_dir /output \
        --images_dir /images \
        --split $SPLIT \
        --metrics_file /output/metrics_file \
        --model_module teach.inference.remote_model \
        --model_class RemoteModel \
        --model_api_host_and_port "@TeachModelAPI:$API_PORT"

Step 5 - Results

The evaluation metrics will be in $HOST_OUTPUT_DIR/$SUBMISSION_PK/metrics_file. Images for each episode will be in $HOST_IMAGES_DIR/$SUBMISSION_PK.

Running without docker

You may want to test your implementation without rebuilding Docker images. You can test your model by directly calling the teach_api CLI command e.g.

Using the teach.inference.sample_model.SampleModel:

export DATA_DIR=/home/ubuntu/teach-dataset
export IMAGE_DIR=/tmp/images

teach_api \
    --data_dir $DATA_DIR \
    --images_dir $IMAGE_DIR

Using the teach.inference.et_model.ETModel assuming you already moved the models from the teach-dataset location to ./models following instructions in Step 2 - Build your container.

export DATA_DIR=/home/ubuntu/teach-dataset
export IMAGE_DIR=/tmp/images

teach_api \
    --data_dir $DATA_DIR \
    --images_dir $IMAGE_DIR \
    --split valid_seen \
    --model_module teach.inference.et_model \
    --model_class ETModel \
    --model_dir ./models/baseline_models/et \
    --visual_checkpoint ./models/et_pretrained_models/fasterrcnn_model.pth
    --object_predictor ./models/et_pretrained_models/maskrcnn_model.pth \
    --seed 4 

The corresponding command for running teach_inference against such an API without container uses teach.inference.remote_model.RemoteModel.

export DATA_DIR=/home/ubuntu/teach-dataset
export OUTPUT_DIR=/home/ubuntu/output/valid_seen
export METRICS_FILE=/home/ubuntu/output/valid_seen/metrics
export IMAGE_DIR=/tmp/images

teach_inference \
    --data_dir $DATA_DIR  \
    --output_dir $OUTPUT_DIR \    
    --split valid_seen \
    --metrics_file $METRICS_FILE \    
    --model_module teach.inference.remote_model \
    --model_class RemoteModel \        
    --model_api_host_and_port 'localhost:5000' \
    --images_dir $IMAGE_DIR

Smaller split

It may be useful for faster turn around time to locally create a smaller split in $DATA_DIR/edh_instances/test_seen with a handful of files from $DATA_DIR/edh_instances/valid_seen for faster turn around times.

Runtime Checks

The TEACh Benchmark Challenge places a maximum time limit of 36 hours when using all GPUs of a p3.16xlarge instance. The best way to verify that your code is likely to satisfy this requirement would be to use a script to run two Docker evaluation processes in sequence on a p3.16xlarge EC2 instance, one for the valid_seen split and one for the valid_unseen split. Note that you will need to specify export API_GPUS='"device=1,2,3,4,5,6,7"' (we reserve GPU 0 for ai2thor in our runs) to use all GPUs and your model code will need to place different instances of the model on different GPUs for this test (see the use of process_index in ETModel.set_up_model() for an example). Also note that while the test splits are close in size to the validation splits, they are not identical so your runtime estimate will necessarily be an approximation.

TEACh API Specification

As mentioned above, teach_api already implements this API and it is usually not necessary to implement this yourself. During evaluations of submissions, edh_instances without ground truth and images corresponding to the edh_instances' histories will be available in /data. /images will contain images produced during inference at runtime. teach_api already handles loading and passes them to your implementation of teach.inference.teach_model.TeachModel.

Start EDH Instance

This endpoint will be called once at the start of processing a new EDH instance. Currently, we ensure that the API processes only a single EDH instance from start to finish i.e. once called it can be assumed that the previous EDH instance has completed.

URL : /start_new_edh_instance
Method : POST
Payload:

{
    "edh_name": "[name of the EDH instance file]"
}

Responses:

Status Code: 200
Response: success

Status Code: 500
Response: [error message]

Get next action

This endpoint will be called at each timestep during inference to get the next predicted action from the model.

URL : /get_next_action
Method : POST
Payload:

{
    "edh_name": "[name of the EDH instance file]",
    "img_name": "[name of the image taken in the simulator after the previous action]",
    "prev_action": "[JSON string representation of previous action]", // this is optional
}

Responses:

Status Code: 200

{
    "action": "[An action name from all_agent_actions]",
    "obj_relative_coord": [0.1, 0.5] // see teach.inference.teach_model.TeachModel.get_next_action
}

Status Code: 500
Response: [error message]

TEACh EDH Offline Evaluation

While the leaderboard for the TEACh EDH benchmark is not active, we recommend that researchers follow the following protocol for evaluation. A split of the existing TEACh validation splits has been provided in the src/teach/meta_data_files/divided_split directory. For your experiments, please use the divided_val_seen and divided_val_unseen splits for validation and divided_test_seen and divided_test_unseen for testing. Note that the TEACh code has not been modified at the moment to directly support use of these splits, so you will need to locally reorganize your data directory so that games, EDH instances and image folders are reorganized according to the divided split. Some additional notes:

  1. If you have previously tuned hyperparameters using the full TEACh validation split, you will need to re-tune hyperparameters on just the divided_val_seen or divided_val_unseen splits for fair comparison to other papers.
  2. The divided test splits are likely to be easier than the original TEACh test split as the floorplans used in the divided_val_unseen and divided_test_unseen splits are identical.
  3. Please do not incorporate the divided_val_seen or divided_val_unseen splits into your training set and retrain after hyperparameter tuning if using this protocol, as the divided_test_unseen split will then no longer be unseen.
  4. We have observed that the ET model can show some variance when being retrained on ALFRED or TEACh even when changing only the random seeds, and as such we expect some performance differences between the full TEACh validation splits, TEACh test splits and divided splits.
  5. Alexa Prize SimBot Challenge Participants please refer to challenge rules regarding publications.

Security

See CONTRIBUTING for more information.

License

The code is licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE).