Closed farinamhz closed 1 year ago
Based on my research on available choices for the translation model, we have the following options:
Translate Library of Pypi (translate 3.6.1): It is free but not accurate translation
Pretrained models, which are popular, open-source, and available in Hugging Face:
No Language Left Behind (NLLB) from Meta:
For the early experiment, I have chosen the NLLB, nllb-200-distilled-600M version, and it is working properly on the back-translation task for text-to-text. However, this needs some other work for the allignments to be added to the pipeline.
@hosseinfani
Hi @hosseinfani , We have the module of translation now, which is working on a dataset of texts, and results will be the translation and back translation of those texts, each of them a csv file containing one column of sentences. You can see early results in this commit: link
Now, I will go to work on the word alignment module.
The available languages in nllb:
https://github.com/facebookresearch/flores/tree/main/flores200#languages-in-flores-200
@farinamhz I think we can safely close this issue. Let me know otherwise.
In this issue, we are going to provide a suitable model or API for translating the reviews from English to Language L and then back-translate the reviews from L to English.