topic distribution at user and follower count interval level

bellecarrell / twitter_brand

In developing a brand on Twitter (and social media in general), how does what you say and how you say it correspond to positive results (more followers, for example)?

0 stars 1 forks source link

topic distribution at user and follower count interval level #81

Open bellecarrell opened 5 years ago

bellecarrell commented 5 years ago

Fit LDA to all tweets. Infer topic distribution for each user. Compare average # of topics (or entropy) for high and low follower count users.

bellecarrell commented 5 years ago

@abenton questions:

In general the "which library would be best/easiest to use" will come up more than once in upcoming analyses/model implementation steps, but in particular I know LDA can be finnicky. I think you may have linked me to a deep LDA implementation of your own a while back. Do you have a recommendation for an off-the-shelf LDA to start off with, as well as guidance? Or perhaps would this be a better task for you, considering your experience?

abenton commented 5 years ago

Mallet is good, gensim also has an implementation that may work well, non-negative matrix factorization is also an option we just want to extract coherent sets of words: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html