bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k stars 101 forks source link

Training customized bpemb #22

Closed gccome closed 5 years ago

gccome commented 5 years ago

Hello,

Thanks for making such a great libary!

I can train a SentencePiece model with my own data, so I am wondering if there is a way to train a bpemb with our own data and the customized SentencePiece model. Please advise.

Thanks!

gccome commented 5 years ago

I just found a way of doing it. Thanks!