bmschmidt / wordVectors

An R package for creating and exploring word2vec and other word embedding models
Other
280 stars 78 forks source link

is there a way to get the token frequencies #53

Open archenemies opened 5 years ago

archenemies commented 5 years ago

Great tool!

I couldn't figure out how words are sorted by frequency if the frequencies are not part of the .bin file or the VectorSpaceModel. I guess the frequencies are tracked in the code which does the training, but left out of the trained vector file? Maybe I'll use 1/rank (Zipf's law) to approximate the frequency, but it would be good to have this documented somewhere. Thanks!