Closed guidopetri closed 4 years ago
LR so far:
Vectorizer: count
ROC: 0.6824659601824518
AP: 0.1561411056103147
Vectorizer: tfidf
ROC: 0.6927763586025801
AP: 0.16498424060330158
Vectorizer: hashing
ROC: 0.5
AP: 0.101564675093268
Vectorizer: binary
ROC: 0.6849821051408348
AP: 0.15780968628581854
Vectorizer: hashing_binary
ROC: 0.5
AP: 0.101564675093268
:/
edit:
with predict_proba
and ignoring the hashing vectorizer types:
Vectorizer: count
ROC: 0.7492
AP: 0.2328
Precision: 0.1665
Recall: 0.8410
Vectorizer: tfidf
ROC: 0.7556
AP: 0.2353
Precision: 0.1831
Recall: 0.7780
Vectorizer: binary
ROC: 0.7466
AP: 0.2274
Precision: 0.1692
Recall: 0.8314
As part of the modeling process, we want to look at Logistic Regression (since we saw this in class) and how it performs on our dataset. I'm also a little curious as to what gradient boosting would yield on the dataset too, since personally I've always had a good experience with GBs.
LR won't have too many parameters to tune - there's only two classes, so no multiclass OvA OvO stuff. I'm guessing it's mostly just controlling for the regularization amount. This'll probably run by super quick so I'll probably try a lot of values here.
As for GB - we don't really need to have a GB model, but I'm going to explore it too (it also makes our workload a bit more fair haha). This is a sequential model so it'll probably take a long time to run, and it also considers a lot of features just like RFs do - so I imagine it'll probably be one of those things that I turn it on and 3 days later check the results. Hopefully they're at least somewhat good. There's a lot of parameters to tweak here and I have no priors as to what might be good, so here's hoping.