bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k stars 101 forks source link

How to use BPEmb as pre-trining model #46

Closed aimanmutasem closed 4 years ago

aimanmutasem commented 4 years ago

Dear @bheinzerling

Is there any way to use the same vector I have uploaded as pre-training model, For example instead of :

url = 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.vec' SRC.build_vocab(train_data, vectors = Vectors('wiki.en.vec', url = url) , unk_init = torch.Tensor.normal_ , min_freq = 2)

I have tried bpemb_en.emb and bpemb_en.vectors SRC.build_vocab(train_data, vectors = bpemb_en.vectors , unk_init = torch.Tensor.normal_ , min_freq = 2)

I got the below error: ValueError: Got input vectors of type <class 'gensim.models.keyedvectors.Word2VecKeyedVectors'>, expected str or Vectors object