Open bellecarrell opened 5 years ago
@abenton questions:
Mallet is good, gensim also has an implementation that may work well, non-negative matrix factorization is also an option we just want to extract coherent sets of words: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
Fit LDA to all tweets. Infer topic distribution for each user. Compare average # of topics (or entropy) for high and low follower count users.