arendakessian / spring2020-ml-project

fake review detection system
2 stars 3 forks source link

Perceptron + SVM modeling #9

Closed guidopetri closed 4 years ago

guidopetri commented 4 years ago

As part of the modeling process, we want to look into Perceptron and SVM models. Mostly this is because we looked at them specifically in class.

I'm guessing perceptron won't have a lot of params to look at. It might take super long to run though, given that it's basically iterating over every example in the training set... It could be useful to do some sort of early stopping after a certain number of iterations (especially since I doubt the training set is linearly separable).

As for SVM - in my experience in the past, this has also taken forever to run. This kind of makes sense now that I know how it works, but basically it's because we're actually calculating distances between pairs of examples... so the more examples we have, the longer it'll take (good thing we're downsampling). I would guess that sklearn has some sort of parallelism to this (or so I hope). I'm guessing start with the RBF kernel? The only other one I'd imagine is really good is the polynomial kernel. Finally, the C param is probably the most important one, since it essentially controls the regularization (higher C = less regularization). I'm curious to see how much regularization is needed for a good AUC/AP on the dev set.

Thanks for taking care of this Aren :)

guidopetri commented 4 years ago

@arendakessian all ready for you bud. :) (go Flames)

arendakessian commented 4 years ago

closing as per most recent commit 56a8968