3Top / word2vec-api

Simple web service providing a word embedding model
http://www.3top.com
1.43k stars 355 forks source link

Error Loading Model #18

Open johnludwigm opened 7 years ago

johnludwigm commented 7 years ago

I have already viewed [https://github.com/3Top/word2vec-api/issues/6](this previous issue), but I haven't yet solved my problem.

I downloaded "Wikipedia+Gigaword 5" from https://github.com/3Top/word2vec-api and am trying to open the model. I run model = gensim.models.Word2Vec.load_word2vec_format('glove.6B.300d.txt', binary = False) and I get the following:

Traceback (most recent call last): File "<pyshell#5>", line 1, in <module> model = gensim.models.Word2Vec.load_word2vec_format('glove.6B.300d.txt', binary = False) File "C:\Python35\lib\site-packages\gensim\models\word2vec.py", line 1308, in load_word2vec_format raise DeprecationWarning("Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead.") DeprecationWarning: Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead.

Ok, so I run model = gensim.models.KeyedVectors.load_word2vec_format('glove.6B.300d.txt', binary = False) and get Traceback (most recent call last): File "<pyshell#9>", line 1, in <module> model = gensim.models.KeyedVectors.load_word2vec_format('deps.txt', binary = False) File "C:\Python35\lib\site-packages\gensim\models\keyedvectors.py", line 193, in load_word2vec_format vocab_size, vector_size = map(int, header.split()) # throws for invalid file format ValueError: invalid literal for int() with base 10: 'the'

I saw lechatpito's comment in the aforementioned thread and opened the large text file, added "400000 300" and hit enter, then file -> Save As, and saved it as "new_deps.txt".

So I run model = gensim.models.KeyedVectors.load_word2vec_format('new_deps.txt', binary = False) and get

Traceback (most recent call last): File "<pyshell#10>", line 1, in <module> model = gensim.models.KeyedVectors.load_word2vec_format('new_deps.txt', binary = False) File "C:\Python35\lib\site-packages\gensim\models\keyedvectors.py", line 193, in load_word2vec_format vocab_size, vector_size = map(int, header.split()) # throws for invalid file format ValueError: invalid literal for int() with base 10: '\ufeff400000'

and I don't know what to do :(.

ZilvinasKucinskas commented 7 years ago

Same here with common crawl model.