avidale / compress-fasttext

Tools for shrinking fastText models (in gensim format)
MIT License
169 stars 13 forks source link

Error while compressing #6

Closed sridhardev07 closed 2 years ago

sridhardev07 commented 2 years ago

I am trying to compress the Fasttext wiki model: link https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip

I tried with first approach by load_facebook_model() got the error: NotImplementedError: Supervised fastText models are not supported

and when tried with second approach of gensim: return _pickle.load(f, encoding='latin1') _pickle.UnpicklingError: invalid load key, '9'.

avidale commented 2 years ago

The documentations says:

This Python 3 package allows to compress fastText word embedding models (from the gensim package)

Therefore, the Facebook format is not supported. Only the gensim format is supported.

avidale commented 2 years ago

@sridhardev07 I have looked at the vectors you suggest, and they are JUST WORD VECTORS. The whole idea of FastText compression is that we reuse subword vectors more efficiently, but in the link that you provide all subword vectors are discarded.

I suggest that you use https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz instead: this model has valid subword vectors and therefore can be compressed.

For an example, please see this notebook. Or just use a tiny model for English that I have compressed: https://github.com/avidale/compress-fasttext/releases/download/v0.0.4/cc.en.300.compressed.bin.