time series - Githubissues

bellecarrell / twitter_brand

In developing a brand on Twitter (and social media in general), how does what you say and how you say it correspond to positive results (more followers, for example)?

0 stars 1 forks source link

time series #74

Closed bellecarrell closed 5 years ago

bellecarrell commented 5 years ago

Collect the following parallel time series for each user: (1) tweets ego made ordered by time, and (2) daily success metric (unnormalized follower count delta, \% change in follower count, Lampos success metric) with ${1, 2, 7, 30}$ day horizon. Hopefully this is generic enough that we can bin tweets in different ways

bellecarrell commented 5 years ago

@abenton questions:

Did you have an organizational structure in mind that would make it easiest to do analyses? I was thinking we'd have either
- top-level dir by user id (e.g. user_122323/) containing all tweets by user
- top-level dir by date (like the way they are collected by the cron job) containing tweets by all users for that time point, but in separate files for each user

abenton commented 5 years ago

This is a hard problem, storing and manipulating. I would write all tweets to a table with user ID and date along with tweet text. You will have a separate table with user information to join if necessary. If this table is too big to deal with easily, then I would split it into separate tables by user ID, all sitting in one directory.

Once we have extracted features, the time series can be written to a single compressed numpy file: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.savez_compressed.html This will be the easiest to work with.

bellecarrell commented 5 years ago

worked on test data. running a job on the grid over all data under /exp/abenton/twitter_brand_data/