This is the implementation of the approaches described in the paper:
Emanuele Bugliarello and Naoaki Okazaki. Enhancing Machine Translation with Dependency-Aware Self-Attention. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020.
We provide the code for reproducing our results, as well as translation outputs of each model.
You can clone this repository with submodules included issuing: git clone --recurse-submodules git@github.com:e-bug/pascal
The requirements can be installed by setting up a conda environment:
conda env create -f environment.yml
followed by source activate pascal
The pre-processing steps for each model in each data set can be found in the corresponding experiments/
folder, and rely on our code (scripts/
) as well as on third-party software (tools/
).
Scripts for training each model are provided in the corresponding data set folder in experiments/
(e.g., experiments/wmt16en2de/transformer/train.sh
).
Note that we trained our models on a SGE cluster.
To run our training experiments, submit (qsub
) the corresponding train.sge
file for a given experiment.
It calls the train.sh
file associated in its directory.
Similarly, you can use the corresponding eval.sh
and eval.sge
files to evaluate a model.
experiments/
Contains code to reproduce our results. For each data set, the following files are used to prepare the data:
prepare_data.sh
: Google's pre-processing steps for the Transformer modelprepare_filt_data.sh
: use langdetect to remove sentences in languages that do not match source or target onesprepare_lin_parses.sh
: extract linearized parses for the multi-task approach of Currey and Heatfield (WMT'19).prepare_tags_label.sh
: extract dependency labels following the approach of Sennrich and Haddow (WMT'16)prepare_tags_mean.sh
: extract dependency heads and map them to mean/middle position of the parent's sub-word unitsprepare_tags_root.sh
: extract dependency heads and map them to first position (root) of the parent's sub-word unitsbinarize_*.sh
files: convert text data into binary files used by Fairseqfairseq/
Our code is based on a fork of Fairseq (commit ID can be found in VERSION.md
).
Here, we introduce a new tags-translation
task to accept two source files (words and syntactic tags).
This is implemented through the following files:
data/tags_language_pair_dataset.py
models/fairseq_tags_encoder.py
models/fairseq_model.py
tasks/tags_translation.py
We also implement the following dependency-aware Transformer models:
models/pascal_transformer.py
modules/multihead_pascal.py
models/lisa_transformer.py
modules/multihead_lisa.py
criterions/lisa_cross_entropy.py
models/tagemb_transformer.py
scripts/
: data preparation scripts for extracting syntactic tags
tools/
: third-party and own software used in pre-processing (e.g., Moses and BPE) as well as evaluation (e.g., RIBES)
This work is licensed under the MIT license. See LICENSE
for details.
Third-party software and data sets are subject to their respective licenses.
If you find our code/models or ideas useful in your research, please consider citing the paper:
@inproceedings{bugliarello-okazaki-2020-enhancing,
title = "Enhancing Machine Translation with Dependency-Aware Self-Attention",
author = "Bugliarello, Emanuele and
Okazaki, Naoaki",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.147",
pages = "1618--1627",
}