Closed mglowacki100 closed 2 years ago
@mglowacki100 could you please share a link to the original model so that I could reproduce this problem?
In general, compress_fasttext 0.0.7
is expected to work with gensim 3.7.2
.
@avidale Thanks for fast reply, Here are the steps to reproduce issue with google colab: https://colab.research.google.com/gist/mglowacki100/1b018bab65199fdd6060204802d60de7/compress_ft_gensim.ipynb Script is based on https://github.com/RaRe-Technologies/gensim/releases/3.6.0 and compress_fasttext.
The error that you got is due to the difference between the FastText
and FastTextKeyedVectors
classes in Gensim. The former includes the latter along with some additional information used only for training the model. The compress_fasttext
package works only with the latter.
After running
ft_model = FastText(corpus_file=corpus_fname, workers=-1)
ft_model.save('ft.model')
big_model = gensim.models.fasttext.FastTextKeyedVectors.load('ft.model')
you create a FastText
object instead of a FastTextKeyedVectors
object (which I find very confusing). Instead, you should access its .wv
property:
print(type(big_model)) # gensim.models.fasttext.FastText
print(type(big_model.wv)) # gensim.models.keyedvectors.FastTextKeyedVectors
Thus, in order to make compress_fasttext
work, please just replace
small_model = compress_fasttext.prune_ft_freq(big_model, pq=True)
with
small_model = compress_fasttext.prune_ft_freq(big_model.wv, pq=True)
and it should be OK.
Thank you @avidale it solved my issue.
I've tried to compress gensim 3.7.2 fasttext model with compress_fasttext 0.0.7:
I've got errror: `
AttributeError: 'FastText' object has no attribute 'vectors_ngrams'
with call ofprune_ft_freq
Alternatively withprune_ft
function:AttributeError: 'FastText' object has no attribute 'vocab'
Is gensim 3.7.2 too old or I miss something; maybe there was a version of compress_fasttext that supported it?