fani-lab / LADy

LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation
Other
3 stars 4 forks source link

Adding translation model in the pipeline for the back-translation task #24

Closed farinamhz closed 1 year ago

farinamhz commented 1 year ago

In this issue, we are going to provide a suitable model or API for translating the reviews from English to Language L and then back-translate the reviews from L to English.

farinamhz commented 1 year ago

Based on my research on available choices for the translation model, we have the following options:

For the early experiment, I have chosen the NLLB, nllb-200-distilled-600M version, and it is working properly on the back-translation task for text-to-text. However, this needs some other work for the allignments to be added to the pipeline.

@hosseinfani

farinamhz commented 1 year ago

Hi @hosseinfani , We have the module of translation now, which is working on a dataset of texts, and results will be the translation and back translation of those texts, each of them a csv file containing one column of sentences. You can see early results in this commit: link

Now, I will go to work on the word alignment module.

hosseinfani commented 1 year ago

The available languages in nllb:

https://github.com/facebookresearch/flores/tree/main/flores200#languages-in-flores-200

hosseinfani commented 1 year ago

@farinamhz I think we can safely close this issue. Let me know otherwise.