facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.76k stars 4.71k forks source link

the number of words in a one-million-token corpus is only 15173? #1355

Open MengfeiShen opened 7 months ago

MengfeiShen commented 7 months ago

when I use fasttext.train_unsupervisedfunction to learn word vectors, it shows that the number of words is 15173. However, there are more than one million tokens in my training texts. I don't know why.