Closed ValeriiBaidin closed 4 years ago
So there isn't currently functionality for keeping the top N words. However you can remove all words that appear less than N times in the corpus by doing,
abridge_corp!(corp, N)
fixcorp!(corp, trim=true)
or more succinctly,
fixcorp!(corp, abridge=N, trim=true)
Does it have the functionality to reduce vocabulary size? For instance, keep only the top 10000 words. OR Remove words incurred less than K documents.
Thank you in advance.