JunjieHu / dali

Domain Adaptation of Neural Machine Translation by Lexicon Induction
20 stars 5 forks source link

Domain Adaptation of Neural Machine Translation by Lexicon Induction

Implemented by Junjie Hu

Contact: junjieh@cs.cmu.edu

If you use the codes in this repo, please cite our ACL2019 paper.

@inproceedings{hu-etal-2019-domain,
    title = "Domain Adaptation of Neural Machine Translation by Lexicon Induction",
    author = "Hu, Junjie and Xia, Mengzhou and Neubig, Graham and Carbonell, Jaime",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1286",
    doi = "10.18653/v1/P19-1286",
    pages = "2989--3001",
}

Installation

Downloads

The preprocessed data and pre-trained models can be found here. Extract dataset.tar.gz under the dali directory. Extract {data-bin, it-de-en-epoch40, it2emea-de-en}.tar.gz under the dali/outputs directory.

The pre-trained model in the it domain can obtain the BLEU scores in the five domains as follows. After adaptation, the BLEU in the emea test set can be raised to 18.25 from 8.23. The BLEU scores are slightly different from those in the paper since we used different NMT toolkits (fairseq v.s. OpenNMT), but we observed similar improvements as we found in the paper.

Out-of-domain In-domain
it emea koran subtitles acquis
it 58.94 8.23 2.50 6.26 4.34

Demo