Kyubyong / wordvectors

Pre-trained word vectors of 30+ languages
MIT License
2.22k stars 393 forks source link

Error while loading the bin file #9

Closed gitlost-murali closed 6 years ago

gitlost-murali commented 7 years ago

I have downloaded the pre trained hindi word2vec model.I loaded the binary file using "model = gensim.models.KeyedVectors.load_word2vec_format('hi.bin',binary=True)"

But I get the following error: " File "C:\Users***\AppData\Local\Programs\Python\Python35\lib\site-packages\gensim\utils.py", line 240, in any2unicode return unicode(text, encoding, errors=errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte"

I have tried this on python3.5,python 2.7 but couldn't escape from the error.

gitlost-murali commented 7 years ago

model = gensim.models.KeyedVectors.load( 'hi.bin')

this solved my problem but can't go further,I am unable to figure out a way to get vector from the word. I tried using model.wv['wordinLanguage'] KeyError: "word '?????????' not in vocabulary" how to deal with this.?

BrazilForever11 commented 7 years ago

@Murali81 How did you figure it out? It would be awesome, if author would post some tutorial on how to read embeddings. By the way, polyglot seems to be not working. So this project has great value!

ksopyla commented 6 years ago

If you want to load one of the language word2vec file with gensim 3.0 and python3.5 you can use this snippet

    from gensim.models.keyedvectors import KeyedVectors
    embeddings_file_bin = 'data/pl.bin'
    model_bin = KeyedVectors.load(embeddings_file_bin)
    print(model_bin['kot'])
gitlost-murali commented 6 years ago

Thanks

kusumlata123 commented 6 years ago

embeddings_file_bin = 'data/pl.bin here what is 'data/pl.bin

gayatrivenugopal commented 5 years ago

Tried using load() but getting an error: AttributeError: 'Word2Vec' object has no attribute 'vocabulary'

caitaozhan commented 5 years ago

Tried using load() but getting an error: AttributeError: 'Word2Vec' object has no attribute 'vocabulary'

I had similar issues and solved it by degrading my gensim version from 3.6 to 3.0