bellecarrell / twitter_brand

In developing a brand on Twitter (and social media in general), how does what you say and how you say it correspond to positive results (more followers, for example)?
0 stars 1 forks source link

Add IV types to feature table #113

Open bellecarrell opened 5 years ago

bellecarrell commented 5 years ago
bellecarrell commented 5 years ago

@bellecarrell also save # of tweets made overall. When computing entropy, make sure to smooth the distribution -- can just do add-\delta smoothing where \delta is something smallish (e.g. 0.1). In case we need to go back and recompute entropy, I would also write out the distributions you compute entropy over, so we can try different smoothing schemes.

I am worried about cases where the blogger may have just posted a single tweet, in which case entropy will be 0 if unsmoothed.

bellecarrell commented 5 years ago

current checked have code --- still need to test.

unchecked no code

bellecarrell commented 5 years ago

last 3 sections of features -- time of day posting, topic, sentiment -- aren't in current timeline table (sentiment and topic in tables to be joined, time of day also). details for topic and sentiment:

Sentiment features can be found here on the COE grid:

/exp/abenton/twitter_brand_workspace_20190417/sentiment/promoting_user_tweets.with_lexiconbased_sentiment.noduplicates.tsv.gz

Columns: "tweet_sentiment_score" and "tweet_sentiment_class"

Sentiment was inferred using a lexicon that also accounts for words appearing in negated spans (preceded by a negation word without any intervening punctuation). See for description (S140 AffLex and S140 NegLex):

@Article{kiritchenko2014sentiment, title={Sentiment analysis of short informal texts}, author={Kiritchenko, Svetlana and Zhu, Xiaodan and Mohammad, Saif M}, journal={Journal of Artificial Intelligence Research}, volume={50}, pages={723--762}, year={2014} }

Topic weights per tweet can be found here:

/exp/abenton/twitter_brand_workspace_20190417/topic_modeling/promoting_user_tweets.with_topic_dist_inferred_by_nmf-k50_userlevel.noduplicates.tsv.gz

under column "topics_per_tweet".

Trained NMF model: /exp/abenton/twitter_brand_workspace_20190417/topic_modeling/nmf-k50-alpha0.0.model.pickle

Representative words per topic: /exp/abenton/twitter_brand_workspace_20190417/topic_modeling/nmf-k50-alpha0.0.topics.txt

Used a rank 50 non-negative matrix factorization to infer "topics". NMF was fit on document-term matrix where I consider all tweets user made to be a document, and then applied this model to each tweet to infer a weighting of topics for each individual tweet.