bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k stars 101 forks source link

Difference between "en.wiki.bpe.vs50000" and "en.wiki.bpe.op50000" #32

Closed caozhen-alex closed 5 years ago

caozhen-alex commented 5 years ago

Hi,

What's the difference between "en.wiki.bpe.vs50000" and "en.wiki.bpe.op50000" and 'txt' and 'bin'?

Or where can I find the explanation?

Thank you.

bheinzerling commented 5 years ago

Files with "op" in the name like "en.wiki.bpe.op50000" are old versions. The ones linked on the BPEmb website and downloaded by the Python library are the newest ("vs") version.

txt and bin are different file formats. txt is plain text, bin is the gensim format.

caozhen-alex commented 5 years ago

Many thanks!@bheinzerling