BaohaoLiao / multiback

Code for the WMT21 paper "Back-translation for Large-Scale Multilingual Machine Translation"
4 stars 1 forks source link

Multilingual Backtranslation

This repository provides scripts, including preprocessing and training, for our WMT21's paper, Back-translation for Large-Scale Multilingual Machine Translation.

Key Points of Our Paper

What's New

Quick Links

Installation

The installation instruction borrowed from fairseq. In case of version problem, we offer the fairseq we trained with.

Preparation

All data and pretrained models are available in the challenge page. We mainly show how to process the data of small task #2. For the data of small task #1, just modify the lines with "# TODO" in the scripts for small task #2.

Train on Parallel Data

All training scripts are in train_scripts

cd train_scripts
bash transBaseForSmallTask2ParallelData.sh

Here we list the number of GPUs used for each script. If you don't have enough GPUs, just change the flag --update-freq to match our setting. We don't really tune the hyper-parameters and mainly borrow them from fairseq examples.

Task Model Script #GPU #epoch
Small Task #2 Trans_small transSmallForSmallTask2ParallelData.sh 32 1
Small Task #2 Trans_base transBaseForSmallTask2ParallelData.sh 32 2
Small Task #2 Trans_big transBigForSmallTask2ParallelData.sh 128 2

Generation & Evaluation

Back-translation

Citation

Please cite as:

@inproceedings{liao-etal-2021-back,
    title = "Back-translation for Large-Scale Multilingual Machine Translation",
    author = "Liao, Baohao  and
      Khadivi, Shahram  and
      Hewavitharana, Sanjika",
    booktitle = "Proceedings of the Sixth Conference on Machine Translation",
    month = nov,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.wmt-1.50",
    pages = "418--424",
}