Open audrism opened 5 years ago
Good results with doc2vec. Haven't integrated with RF yet
Our doc2vec results are actually pretty confusing. The accuracies are good at labeling a user correctly but we also get a lot of false positives. The model is terrible when we feed it custom text. We think this is an issue with the sparsity of our data. We will look into this.
Added more users that were not relevant to help even out the data. Classified the users in Users_Labeled using only the user bio gives a total Fscore of 93.8%. Added the last 5 tweets for each user. Classified the users in Users_Labeled using only the tweets gives a total Fscore of 96.5% Classified the users in Users_Labeled using both the tweets and bios gives a total Fscore 97.5%
We've increased our dataset by adding irma users. We are working on increasing our data in general also
10 FOLD VERIFICATION ON CURRENT TRAINING SET
Labeling only tweets: government: 97% news: 97% not_news: 96% nonprofits: 97% utility: 95%
Labeling only bio: government: 94% news: 94% not_news: 93% nonprofits: 93% utility: 93%
Labeling both bios and tweets with RandomForest: government: not finished news: not finished not_news: not finished nonprofits: not finished utility: not finished
Labeling both bios and tweets with Doc2Vec: government: 97% news: 97% not_news: 97% nonprofits: 96% utility: 97%
Integrating d2v to classify users learning to add RF on top