Open DavidRamosSal opened 2 weeks ago
Hi @DavidRamosSal!
No, gensim doesn't support sparse models, and sparsity is the main compressive force in compress-fasttext
. Thus, compress-fasttext
models aren't convertible back to pure gensim format.
If you want a pure gensim model which is also small, the recommended approach is to train a small model from scratch, using either gensim or the original Fasttext package.
In terms of the original Fasttext options (https://fasttext.cc/docs/en/options.html), those that affect model size most are:
bucket
: the number of trainable vectors for character n-grams. The default value is 2 million, but something as small as several thousands is already workable.minCount
: the minimal frequency for a word to be included in the vocabulary. The default value is 1, which means every single word in your dataset is included in the vocabulary; increase this value to some large integer to include only the most frequent words instead; a good threshold depends on your training dataset.dim
: dimensionality of the embedding. The default value (100) is generally fine; you can experiment with further reducing it (but it may result in a severe decrease in the downstream quality).
Hi, is there a way to save a compressed model in regular gensim format? I can't install compress-fasttext where my application will run, so being able to run
model.most_similar("word")
only with gensim would be great.Thanks in advance!