SforAiDl / twitter-sanity

A python tool to recommend relevant and important tweets from your Twitter feed.
MIT License
6 stars 5 forks source link

Module to convert raw scraped data into a standardised format #5

Open adiah80 opened 4 years ago

adiah80 commented 4 years ago

Raw scraped data from Issue #4 would need to be processed before it can be used for training the models. We need a module that aggregates the raw data into a single dataset (.csv file) containing the training features and labels.

Each tweet tweeted by someone the user follows should be considered as a data point. All the tweets that were interacted with (liked, retweeted, or commented on) should be classified as a positive instance.

Features should include the tweet text, the user who tweeted the tweet, the global tweet interaction metrics (count of likes, retweets, comments), and the tweet time.

More complex features can also be thought of and included.

Akshat2430 commented 4 years ago

Dibs!

ajaysub110 commented 4 years ago

Can you please take up #9 first so that we have the scraping module ready?