Closed karasevdy closed 8 months ago
Hi,
You should have all the files from the archive in one directory, not only model.model
and model.model.vectors.npy
.
If this is done, gensim.models.KeyedVectors.load()
works just fine with this model:
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import gensim
In [2]: model = gensim.models.KeyedVectors.load("model.model")
In [3]: word = "кракозябра"
In [4]: model.most_similar(word)
Out[4]:
[('краков', 0.6272931694984436),
('припорашивать', 0.5428630709648132),
('крак', 0.5345099568367004),
('краковский', 0.529658317565918),
('распуститься', 0.528093159198761),
('припорошать', 0.515566885471344),
('вроцлав', 0.5138404965400696),
('капустный', 0.5137609839439392),
('павлиний', 0.512362539768219),
('ягель', 0.5122756958007812)]
In [5]: model.most_similar("волк")
Out[5]:
[('медведь', 0.7839906215667725),
('зверь', 0.7489554286003113),
('лисица', 0.7402448654174805),
('волчица', 0.7251183390617371),
('заяц', 0.7193619012832642),
('лис', 0.7154371738433838),
('волчонок', 0.7136003971099854),
('олень', 0.7099077105522156),
('шакал', 0.7061660885810852),
('лось', 0.7053733468055725)]
I unzipped the archive and realized that geowac_lemmas_none_fasttextskipgram_300_5_2020 isn't a binary model, that is only 'model.model' was in it and no 'model.bin' like in a ruscorpora_upos_skipgram_600_10_2017 for instance. So, I tried this:
import zipfile model_url = 'http://vectors.nlpl.eu/repository/20/213.zip' m = wget.download(model_url) model_file = model_url.split('/')[-1] with zipfile.ZipFile(model_file, 'r') as archive: stream = archive.open('model.model') model = FastText.load_fasttext_format(datapath(stream))
TypeError Traceback (most recent call last)