Raw scraped data from Issue #4 would need to be processed before it can be used for training the models. We need a module that aggregates the raw data into a single dataset (.csv file) containing the training features and labels.
Each tweet tweeted by someone the user follows should be considered as a data point. All the tweets that were interacted with (liked, retweeted, or commented on) should be classified as a positive instance.
Features should include the tweet text, the user who tweeted the tweet, the global tweet interaction metrics (count of likes, retweets, comments), and the tweet time.
More complex features can also be thought of and included.
Raw scraped data from Issue #4 would need to be processed before it can be used for training the models. We need a module that aggregates the raw data into a single dataset (.csv file) containing the training features and labels.
Each tweet tweeted by someone the user follows should be considered as a data point. All the tweets that were interacted with (liked, retweeted, or commented on) should be classified as a positive instance.
Features should include the tweet text, the user who tweeted the tweet, the global tweet interaction metrics (count of likes, retweets, comments), and the tweet time.
More complex features can also be thought of and included.