I downloaded "Wikipedia+Gigaword 5" from https://github.com/3Top/word2vec-api and am trying to open the model. I run model = gensim.models.Word2Vec.load_word2vec_format('glove.6B.300d.txt', binary = False) and I get the following:
Traceback (most recent call last): File "<pyshell#5>", line 1, in <module> model = gensim.models.Word2Vec.load_word2vec_format('glove.6B.300d.txt', binary = False) File "C:\Python35\lib\site-packages\gensim\models\word2vec.py", line 1308, in load_word2vec_format raise DeprecationWarning("Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead.") DeprecationWarning: Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead.
Ok, so I run model = gensim.models.KeyedVectors.load_word2vec_format('glove.6B.300d.txt', binary = False) and get
Traceback (most recent call last): File "<pyshell#9>", line 1, in <module> model = gensim.models.KeyedVectors.load_word2vec_format('deps.txt', binary = False) File "C:\Python35\lib\site-packages\gensim\models\keyedvectors.py", line 193, in load_word2vec_format vocab_size, vector_size = map(int, header.split()) # throws for invalid file format ValueError: invalid literal for int() with base 10: 'the'
I saw lechatpito's comment in the aforementioned thread and opened the large text file, added "400000 300" and hit enter, then file -> Save As, and saved it as "new_deps.txt".
So I run model = gensim.models.KeyedVectors.load_word2vec_format('new_deps.txt', binary = False) and get
Traceback (most recent call last): File "<pyshell#10>", line 1, in <module> model = gensim.models.KeyedVectors.load_word2vec_format('new_deps.txt', binary = False) File "C:\Python35\lib\site-packages\gensim\models\keyedvectors.py", line 193, in load_word2vec_format vocab_size, vector_size = map(int, header.split()) # throws for invalid file format ValueError: invalid literal for int() with base 10: '\ufeff400000'
I have already viewed [https://github.com/3Top/word2vec-api/issues/6](this previous issue), but I haven't yet solved my problem.
I downloaded "Wikipedia+Gigaword 5" from https://github.com/3Top/word2vec-api and am trying to open the model. I run
model = gensim.models.Word2Vec.load_word2vec_format('glove.6B.300d.txt', binary = False)
and I get the following:Traceback (most recent call last): File "<pyshell#5>", line 1, in <module> model = gensim.models.Word2Vec.load_word2vec_format('glove.6B.300d.txt', binary = False) File "C:\Python35\lib\site-packages\gensim\models\word2vec.py", line 1308, in load_word2vec_format raise DeprecationWarning("Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead.") DeprecationWarning: Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead.
Ok, so I run
model = gensim.models.KeyedVectors.load_word2vec_format('glove.6B.300d.txt', binary = False)
and getTraceback (most recent call last): File "<pyshell#9>", line 1, in <module> model = gensim.models.KeyedVectors.load_word2vec_format('deps.txt', binary = False) File "C:\Python35\lib\site-packages\gensim\models\keyedvectors.py", line 193, in load_word2vec_format vocab_size, vector_size = map(int, header.split()) # throws for invalid file format ValueError: invalid literal for int() with base 10: 'the'
I saw lechatpito's comment in the aforementioned thread and opened the large text file, added "400000 300" and hit enter, then file -> Save As, and saved it as "new_deps.txt".
So I run
model = gensim.models.KeyedVectors.load_word2vec_format('new_deps.txt', binary = False)
and getTraceback (most recent call last): File "<pyshell#10>", line 1, in <module> model = gensim.models.KeyedVectors.load_word2vec_format('new_deps.txt', binary = False) File "C:\Python35\lib\site-packages\gensim\models\keyedvectors.py", line 193, in load_word2vec_format vocab_size, vector_size = map(int, header.split()) # throws for invalid file format ValueError: invalid literal for int() with base 10: '\ufeff400000'
and I don't know what to do :(.