bellecarrell / twitter_brand

In developing a brand on Twitter (and social media in general), how does what you say and how you say it correspond to positive results (more followers, for example)?
0 stars 1 forks source link

Build table for fitting regression models #106

Open abenton opened 5 years ago

abenton commented 5 years ago

Need to build tables of independent and dependent variables where each column is a feature the regression model is trained on or a dependent variable we are trying to predict. Each row in these tables corresponds to a single user measured on a specific day with features computed over the last XX days. Since we want to vary the number of days we aggregate features over and possibly the frequency we sample users at, we'll need to generate several of these tables. Aggregation window can be {1, 7, 14, 21, 28} days to start and sampling frequency can be daily.

==Independent==

Controls and hypotheses listed in issue #102

==Dependent variables==

Follower count change in the next {1, 2, 3, ..., 7, 14, 21, 28} days -- horizon We may have to adjust this by predicting % change in follower count rather than predicting raw follower count change.

It should be easy to train models if the data is formatted thusly.


We should be able to compute these tables from the big table you generate currently (where each row is a tweet with relevant features). Just need to make sure this big table contains all the information we need to compute features models will be trained on.

bellecarrell commented 5 years ago

/exp/acarrell/twitter_brand/promoting_users/timeline

abenton commented 5 years ago

Tweets filtered and joined with self-promoting users are written here:

/exp/abenton/twitter_brand_workspace_20190417/promoting_user_tweets.merged_with_user_info.noduplicates.tsv.gz

abenton commented 5 years ago

Important things to keep in mind when fitting models:

abenton commented 5 years ago

Example:

We are computing features with a 7-day long aggregation window for the date of April 15, with a horizon of 2 days:

We will build models to predict the dependent variable (change in future follower count) based on past behavior (independent variables)