DisasterMasters / TweetAnalysis

Repository for storing the code used to analyse the tweets collected from the Twitter scraper
2 stars 3 forks source link

Classification of users #4

Open audrism opened 5 years ago

audrism commented 5 years ago

Integrating d2v to classify users learning to add RF on top

jball1997 commented 5 years ago

Good results with doc2vec. Haven't integrated with RF yet

jball1997 commented 5 years ago

Our doc2vec results are actually pretty confusing. The accuracies are good at labeling a user correctly but we also get a lot of false positives. The model is terrible when we feed it custom text. We think this is an issue with the sparsity of our data. We will look into this.

mickidymick commented 5 years ago

Added more users that were not relevant to help even out the data. Classified the users in Users_Labeled using only the user bio gives a total Fscore of 93.8%. Added the last 5 tweets for each user. Classified the users in Users_Labeled using only the tweets gives a total Fscore of 96.5% Classified the users in Users_Labeled using both the tweets and bios gives a total Fscore 97.5%

jball1997 commented 5 years ago

We've increased our dataset by adding irma users. We are working on increasing our data in general also

jball1997 commented 5 years ago

10 FOLD VERIFICATION ON CURRENT TRAINING SET

Labeling only tweets: government: 97% news: 97% not_news: 96% nonprofits: 97% utility: 95%

Labeling only bio: government: 94% news: 94% not_news: 93% nonprofits: 93% utility: 93%

Labeling both bios and tweets with RandomForest: government: not finished news: not finished not_news: not finished nonprofits: not finished utility: not finished

Labeling both bios and tweets with Doc2Vec: government: 97% news: 97% not_news: 97% nonprofits: 96% utility: 97%