averkij / lingtrain-aligner-editor

Extracts parallel corpora from the 2 raw texts in different languages.
Other
34 stars 4 forks source link

Extract/train bilingual model for zh-ru #1

Open vengodelsur opened 3 years ago

vengodelsur commented 3 years ago

(Issue described by averkij) Multilingual pretrained models are currently used. A model for single language pair can be lighter (so that it's easier to use the model locally) and can possibly have better performance.

For reference: a paper on BERT for English and Arabic https://arxiv.org/abs/2004.14519v2

averkij commented 3 years ago

@vengodelsur Asked @nreimers (sentence-transformers maintainer) for the advice and he was very helpful.

Please, consider this issue https://github.com/UKPLab/sentence-transformers/issues/634 I'll investigate it too.

The plan is following: