bellecarrell / twitter_brand

In developing a brand on Twitter (and social media in general), how does what you say and how you say it correspond to positive results (more followers, for example)?
0 stars 1 forks source link

Extract topic and sentiment features per tweet #107

Closed abenton closed 5 years ago

abenton commented 5 years ago

Sentiment features can be found here on the COE grid:

/exp/abenton/twitter_brand_workspace_20190417/sentiment/promoting_user_tweets.with_lexiconbased_sentiment.noduplicates.tsv.gz

Columns: "tweet_sentiment_score" and "tweet_sentiment_class"

Sentiment was inferred using a lexicon that also accounts for words appearing in negated spans (preceded by a negation word without any intervening punctuation). See for description (S140 AffLex and S140 NegLex):

@article{kiritchenko2014sentiment, title={Sentiment analysis of short informal texts}, author={Kiritchenko, Svetlana and Zhu, Xiaodan and Mohammad, Saif M}, journal={Journal of Artificial Intelligence Research}, volume={50}, pages={723--762}, year={2014} }

abenton commented 5 years ago

Topic weights per tweet can be found here:

/exp/abenton/twitter_brand_workspace_20190417/topic_modeling/promoting_user_tweets.with_topic_dist_inferred_by_nmf-k50_userlevel.noduplicates.tsv.gz

under column "topics_per_tweet".

Trained NMF model: /exp/abenton/twitter_brand_workspace_20190417/topic_modeling/nmf-k50-alpha0.0.model.pickle

Representative words per topic: /exp/abenton/twitter_brand_workspace_20190417/topic_modeling/nmf-k50-alpha0.0.topics.txt

Used a rank 50 non-negative matrix factorization to infer "topics". NMF was fit on document-term matrix where I consider all tweets user made to be a document, and then applied this model to each tweet to infer a weighting of topics for each individual tweet.

abenton commented 5 years ago

Code to extract features is here:

https://github.com/bellecarrell/twitter_brand/tree/abenton10/topic-and-sentiment-extraction/analysis/topic_and_sentiment

I trained a user-level NMF model and applied it to tweets by hand (outside of script).