Open farinamhz opened 11 months ago
[To be updated]
Based on my research on available choices for the translation model and what we discussed before in our previous issue for translation model (https://github.com/fani-lab/LADy/issues/24), there are some libraries that will give us the access to using Google Translate API which I expect to work better than the pretrained models.
I previously tested several libraries that claimed to offer access to the Google Translate API, but they were not successful. Here's a brief overview of my findings:
googletrans
library eventually blocks access from a remote host after processing approximately 1,000 reviews.translatepy
library lacks a batch translation function, and sending individual requests for a large number of reviews seems impractical.As a result, I continued my search and discovered the deep-translator
library, which has proven to be effective with our toy
dataset. I plan to further test this library with the full dataset versions.
In the meantime, I investigated to see if FLAN is a good option, and unfortunately, currently, FLAN doesn't suit our needs, particularly because of the backtranslation step that necessitates tokenization and handling of the reverse translation process. Although FLAN excels in translating into English, its capability to translate from English to other languages falls short, as noted by the authors in the paper. This limitation is attributed to the use of English-specific tokenizers.
In this phase, we plan to integrate an additional translation model to determine if the observed improvements are consistent across various translators or if there's potential for further enhancement.