arendakessian / spring2020-ml-project

fake review detection system
2 stars 3 forks source link

Naïve Bayes + Random Forest modeling #8

Closed guidopetri closed 4 years ago

guidopetri commented 4 years ago

As part of the modeling process, we want to explore what possibilities we've got with Naïve Bayes models, as well as RF. These are both on your desk Kelsey :)

Here's what I've noticed so far in the baseline. NB doesn't really have a lot of params, at least the MultinomialNB that I used. I'm not super confident that that's the NB algorithm we need, but I think it's correct. The Laplace smoothing param there just controls for how small sample sizes become... I'm not really sure what exploring that will yield.

As for RF, it was taking up ~3GB of RAM on my desktop. The more features you put in (with max_features), the longer it'll take to run. Also, max_depth affects it a lot, of course. I didn't really look into any of the other params - maybe setting a minimal leaf size will make it go faster?

Neither of these algorithms had a great performance on the dev set for me. Looking forward to you proving me wrong haha.

guidopetri commented 4 years ago

@kelseymarkey all ready for you when you start. :)

guidopetri commented 4 years ago

This is done btw. I put the results in #13