bwbaugh / infer

A machine learning toolkit for classification and assisted experimentation.
Other
4 stars 4 forks source link

Feature selection for Naive Bayes #23

Open bwbaugh opened 11 years ago

bwbaugh commented 11 years ago

I hear that using odds ratio for feature selection with Naive Bayes isn't that useful, but perhaps the chi-squared test might prove to be better. Though, I'm also trying to remember if feature selection was more important for Bernoulli Naive Bayes rather than Multinomial. Either way, having the ability to do online feature selection would be a nice ability.

Some strategies:

bwbaugh commented 11 years ago

If we use odds ratio for multinomial raw score calculation, it might be useful to have a back-off strategy by first only using the words with odds ratio above a certain threshold, but if none then lower the threshold and try again.

Here is a good question on Stack Overflow: How to use Odds ratio feature selection with Naive bayes Classifier. There they also mention that a complement to using odds ratio is to incorporate document frequency as well so that extremely rare words aren't always included.

Idea was originally recorded 3/14/13 at 17:13.

bwbaugh commented 11 years ago

Odds ratio could potentially be implemented by a script outside of the classifier the calling the conditional method for each feature, and then remove those features from the instance to be classified. Should that be preferred, or should the ability be built-in to the classifier without doing custom work?

Idea was originally recorded 3/14/13 at 17:24.