facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.27k stars 6.39k forks source link

Bidirectional translation with different datasets #3968

Open raphaelmerx opened 2 years ago

raphaelmerx commented 2 years ago

❓ Bidirectional translation with different datasets

What is your question?

I'm using the multilingual translation code to generate a bidirectional tdt-en,en-tdt model (see https://github.com/pytorch/fairseq/issues/2078). I'd like to augment this model with backtranslated data tdt>en. Now this data is meant to be used for training the en>tdt direction only. Is it possible to train a multilingual, bidirectional model, but with different datasets for each direction?

Code

Not relevant

What have you tried?

Train a bidirectional model using parallel data only, use it for backtranslation, then train two separate models for each direction using parallel + backtranslated data. But I would prefer keeping one bidirectional model.

What's your environment?

Not relevant

stale[bot] commented 2 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

raphaelmerx commented 2 years ago

bump