facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.93k stars 4.72k forks source link

how to keep subwords only in final model? #817

Open ZizhenWang opened 5 years ago

ZizhenWang commented 5 years ago

Hi there

I find the output model contains not only ngram subwords, but also original whole word, which makes the vocabulary size much bigger. I want to just keep the ngram subwords, how can I do that?

Thanks

Celebio commented 5 years ago

Hi @ZizhenWang , What is your use case? Do you want to have a smaller model file? Or you need this feature for something else?

Best regards, Onur

ZizhenWang commented 5 years ago

@Celebio Thanks for your reply, I want to have a smaller model file. I think the subword is necessary in fasttext, but the whole word is not, because it can be inferred by its subwords. Am I right? So I want to keep only the subwords in model file, and remove all whole words.