Closed rohitsaluja22 closed 5 years ago
All sentencepiece models were trained separately on monolingual corpora, so applying them to multilingual text probably won't give you good results unless the languages are very similar.
If you have access to GPUs, you could give multilingual BERT (or its pytorch version) a try.
This is now possible with the new multilingual models: https://nlp.h-its.org/bpemb/multi/
hey, thanks for sharing the code. I am working on the multilingual text. Can I give more than one language to segment words/sentences?