bellecarrell / twitter_brand

In developing a brand on Twitter (and social media in general), how does what you say and how you say it correspond to positive results (more followers, for example)?
0 stars 1 forks source link

Notes about distribution of features in strategy evaluation table #121

Open abenton opened 5 years ago

abenton commented 5 years ago

Issues:

past-PCT_MESSAGES_LAST_FRIDAY : make percent, not proportion done past-PCT_MSGS_9TO12_LOCAL : this is not extracted yet, ignore past-PCT_MSGS_WITH_PERSONAL_URL : this is not extracted yet, ignore past-PCT_DAYS_WITH_SOME_MSG : >100% past-MSG_PER_DAY_ENTROPY_ADD1 : sqrt/log to transform to a more normal distribution past-PCT_MSGS_WITH_POSITIVE_SENTIMENT : skewed to 1 / median/mean sentiment is nicely normal past-TOPIC_DIST_ENTROPY_ADD1 : skewed right (sqrt/log transform) past-TOPIC_DIST_ENTROPY_ADD01 : less heavily right skewed (still consider transforming) current-follower_count : heavily left skewed current-log_follower_count : looks very normal current-friend_count : heavily left skewed current-log_friend_count : looks very normal current-list_count : heavily left skewed current-log_list_count : looks very normal current-user_impact_score : very normal

future-horizon**-pct_change_follower_count : remove examples with too high or low % changes Drop examples where % change in follower count is: <-14.275517487508926 or

21.21083570692685

abenton commented 5 years ago

Thresholds to drop outliers w.r.t. % follower count change, for each horizon (bottom/top 0.05% of outliers):

horizon bottom_threshold top_threshold 1 -14.275517487508926 21.21083570692685 2 -23.98109451577979 36.72719853091206 3 -45.36826998750058 64.88671417934259 4 -47.036165996925966 90.939974390467 5 -51.993431304137964 97.03028521023232 6 -63.86093891156038 99.88901220865705 7 -75.66034165197367 145.3884597910041 14 -78.29387096452871 1993.2268988872765 21 -78.44646047220982 2422.455780569085 28 -78.44788277162283 2475.985100960596

abenton commented 5 years ago

Plots of feature distributions can be found here:

/exp/abenton/twitter_brand_workspace_20190417/extracted_features_20190508/*.pdf

Summary statistics can be found here:

/exp/abenton/twitter_brand_workspace_20190417/extracted_features_20190508/feature_stats.tsv