amueller / word_cloud

A little word cloud generator in Python
https://amueller.github.io/word_cloud
MIT License
10.03k stars 2.31k forks source link

Incorporation of word embeddings? #621

Open jdmoore7 opened 3 years ago

jdmoore7 commented 3 years ago

I see word embeddings as some potentially low hanging fruit for a more robust product. Namely, word embeddings, such as GloVe, are (A) additive and (B) can quantify the similarity between words/phrases, and (C) can be used in conjunction with distance metrics to determine the overall representativeness of a given word/phrase across all words. This might help place/cluster words by (latent) similarity and augment the size of words by their overall representativeness in the "bag of words" so you aren't relying on raw frequencies of occurrences.

Any efforts in the works to develop this sort of functionality?

amueller commented 3 years ago

Hey! I think that would requite a completely different placement algorithm. Some JS libraries use dynamics to evolve the position. I could see that working with some initialization based on word embeddings. Then you lose the ability to do masks and shapes, though (or at least it's very limited). WordCloud is based on basically filling available space, I don't see how that would work with an embedding approach.

I'm not totally opposed to adding a different placement algorithm, but it would require rewriting a lot of the libary.