Closed Abe410 closed 1 year ago
Hi Abe. There's the "max_words" parameter that's 200 by default, so it'll use 200 words. If you set it higher, it might not actually show all of them if the font gets to small, which depends on the font settings.
Hi Abe. There's the "max_words" parameter that's 200 by default, so it'll use 200 words. If you set it higher, it might not actually show all of them if the font gets to small, which depends on the font settings.
Thank you.
One more question. If we use generate
to make a wordcloud, and then use generate_from_frequencies
to create it using a count vectorizer with bigrams, is it the same thing?
Not entirely, since wordcloud uses collocation statistics to figure out which bigrams to use, so it doesn't just use the most frequent ones. The regex and normalization in wordcloud is also slightly different than in CountVectorizer, but the main difference is using collocation statistics. Basically generate_from_frequencies
bypasses any tokenization logic and assumes that you do all that yourself and will just plot whatever tokens you gave it.
Thank you!
Just curious, if we are generating words from text, then how many top words does the cloud use?
And if we generate the cloud using generate_from_frequencies, then how many top frequencies does it use?