facebookresearch / LASER

Language-Agnostic SEntence Representations
Other
3.59k stars 461 forks source link

support for Tigrinya #17

Closed babraham123 closed 5 years ago

babraham123 commented 5 years ago

If you plan on supporting the Tigrinya language (similar to Amharic) in the future, I would be happy to help you find sources of training data. Thanks!

http://bible.geezexperience.com/tigrigna/ http://www.eritrea-chat.com/newspaper/

hoschwenk commented 5 years ago

We would be happy to add more languages to our system, in particular minority languages or variants. If a language is close to one that it is already included (and well trained), there is some chance that the system will generalize well and handle it correctly. To add new languages, we would need parallel data, i.e. sentences in the foreign languages and translations (into English). However, to avoid confusions, our approach is not a machine translation system, but a multilingual sentence representation. It can be used to transfer an NLP application, e.g. sentence classification, sentiment analysis or natural language inference, to target languages without retraining (zero-shot transfer)