Closed robinp closed 2 years ago
It seems the unpickled object doesn't have the vectors
field, which is why the adjust_vectors is called, which then tries to touch the obviously missing vectors_vocab (the code at https://github.com/avidale/compress-fasttext/blob/master/compress_fasttext/compress.py#L27 didn't set it).
Why could that field be missing when unpickling? It is there on the model before it is saved.
Hello! Could you please provide a complete code snippet with loading the full model, compressing it, saving the small model and loading it? If I could reproduce the problem, it would be much easier to solve it.
Hm, https://github.com/RaRe-Technologies/gensim/blob/4.0.0/gensim/models/fasttext.py#L1072 seems to ignore "vectors" on saving. But then how could this work? Or maybe noone tried to load it back yet.
Re example, yeah, missed it, sorry:
from gensim.models import fasttext
from gensim.test.utils import datapath
import compress_fasttext
""" original to gensim - can skip
print("Loading")
big_model = fasttext.load_facebook_model(datapath("/root/py/train/eng.bin"))
print("Saving back")
big_model.wv.save("/root/py/train/orig.gensim")
"""
print("Load gensim vecs")
loaded = fasttext.FastTextKeyedVectors.load("/root/py/train/orig.gensim")
print("Compressing")
small_model = compress_fasttext.prune_ft_freq(loaded)
print("Saving")
small_model.save('/root/py/train/eng-small2')
print("Load back saved")
sm = fasttext.FastTextKeyedVectors.load('/root/py/train/eng-small2')
Thanks, I think I got it!
The old Gensim models had two equivalent attributes, vectors
and vectors_vocab
(vectors
are calculated from vectors_vocab
and vectors_ngrams
). This is obviously redundant, so I kept only vectors
in the model. In the update of Gensim, its developers resolved the redundancy in an alternative way: they decided to save only vectors_vocab
, and recompute vectors
each time the model is loaded.
I don't want to store both vectors
and vectors_vocab
, as in the old Gensim (because it takes disk space). But I also don't want to recompute vectors
each time the model loads (because it takes CPU and makes the model load slower).
I will think how to resolve this carefully. Maybe, just will override _save_specials
. Suggestions are welcome.
@robinp, I have updated the package so that the models are saved and loaded correctly.
Please update it to compress-fasttext>=0.1.2
and check that the problem is gone. You need to replace the line
sm = fasttext.FastTextKeyedVectors.load('/root/py/train/eng-small2')
with
sm = compress_fasttext.CompressedFastTextKeyedVectors.load('/root/py/train/eng-small2')
because compressed models use the optimizations that are not present in FastTextKeyedVectors
(and in gensim
in general).
Works like a charm, thank you!
Hello! I tried to compress a fasttext model, and then load back the saved gensim model. On trying to load, got this exception:
Note: saw this warning while compressing:
but then rerunning and checking a case where the warning is not printed, the issue still stands.
Pipfile:
But also with gensim==4.0.0
Thank you!