ghchen18 / leca

Code for Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation
Other
17 stars 5 forks source link

Code for paper - Lexical-constrained-aware neural machine translation

Install and Data preprocess

The code is implemented on fairseq v0.6.1, follow the same steps to install and prepare the processed fairseq dataset, the WMT process script is here.

Step 1: Install fairseq.

## you may want to build a conda environment first.
git clone https://github.com/ghchen18/leca.git
cd leca
pip install --editable .

Step 2: Process dataset

Follow the steps in the fairseq repo. More dataset can be found in WMT Translation Task. Because of the difference between the used dictionaries, the data preprocessing should use the preprocess.py in this repo instead of the official fairseq repo.

Run experiment

See scripts/run.sh. You may need to revise the variables in the shell scripts first according to your case.

Citation

@inproceedings{chen2020leca,
  title     = {Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation},
  author    = {Chen, Guanhua and Chen, Yun and Wang, Yong and Li, Victor O.K.},
  booktitle = {Proceedings of {IJCAI} 2020: Main track},          
  pages     = {3587--3593},
  year      = {2020},
  month     = {7},
}