aspremon / NaiveFeatureSelection

Code for NaiveFeatureSelection, i.e. feature selection in Naive Bayes, see https://arxiv.org/abs/1905.09884
MIT License
9 stars 7 forks source link

can be categorical features used as input to NaiveFeatureSelection #6

Open Sandy4321 opened 4 years ago

Sandy4321 commented 4 years ago

1 can be categorical features used as input to nfs.fit_transform nfs = NaiveFeatureSelection(k=kv)

Use fit_transform to extract selected features

X_new = nfs.fit_transform(X_train, y_train)

if not , should data be integer feature counts? per The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work then can it be be like this https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html

import numpy as np rng = np.random.RandomState(1) X = rng.randint(5, size=(6, 100)) y = np.array([1, 2, 3, 4, 5, 6]) from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB() clf.fit(X, y)

so can be your code used as nfs.fit_transform(X, y) where

X = rng.randint(5, size=(6, 100)) y = np.array([1, 0, 1, 1, 0, 0])

2 for code as it is for today can be used nfs.fit_transform(X, y) where

X = rng.randint(2, size=(6, 100)) y = np.array([1, 0, 1, 1, 0, 0])

for example data will be like this rng.randint(2, size=(6, 10)) array([[0, 1, 0, 1, 1, 0, 1, 0, 0, 1], [0, 0, 0, 1, 0, 1, 1, 1, 0, 1], [0, 1, 0, 1, 0, 1, 1, 0, 1, 0], [0, 0, 0, 1, 0, 1, 0, 0, 1, 0], [1, 1, 0, 0, 1, 0, 0, 1, 0, 0], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0]])

3 as we can see in https://scikit-learn.org/stable/modules/naive_bayes.html#multinomial-naive-bayes

The decision rule for Bernoulli naive Bayes is based on

which differs from multinomial NB’s rule in that it explicitly penalizes the non-occurrence of a feature that is an indicator for class , where the multinomial variant would simply ignore a non-occurring feature

Important question is : your current implementation explicitly penalizes the non-occurrence of a feature or not penalizes ? then if LabelEncoder https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html used for transforming categorical features to integers

for example as described in https://www.datacamp.com/community/tutorials/naive-bayes-scikit-learn

Import LabelEncoder

from sklearn import preprocessing

creating labelEncoder

le = preprocessing.LabelEncoder()

Converting string labels into numbers.

wheather_encoded=le.fit_transform(wheather)

Converting string labels into numbers

temp_encoded=le.fit_transform(temp) label=le.fit_transform(play)

should be 1 added to encoded values?

or you would suggest another way to transform categorical features?

Thanks...

Sandy4321 commented 4 years ago

for details pls see https://github.com/scikit-learn/scikit-learn/issues/10856

about CategoricalNB and GeneralNB and https://github.com/remykarem/mixed-naive-bayes

Sandy4321 commented 4 years ago

https://datascience.stackexchange.com/questions/58720/naive-bayes-for-categorical-features-non-binary Some people recommend using MultinomialNB which according to me doesn't make sense because it considers feature values to be frequency counts

Sandy4321 commented 4 years ago

Can you please comment this Since already 8 months past when question was created? Just yes or no?

arminaskari commented 4 years ago

The best way to use the results of the paper is to use the bernoulli naive bayes in this problem. A categorical feature can be converted into a binary feature vector; for example if the first feature has 3 categorial values {1,2,3} this first feature can be converted into a new feature that is three dimensional where [1 0 0], [0 1 0], and [0 0 1] represent the categorial variables 1,2, and 3 respectively.

Another approach is to work through the derivation in the paper but now use categorial distributions instead of bernoulli or the multinomial conditional probability distributions.

Sandy4321 commented 4 years ago

Great thanks for answering But Another approach is to work through the derivation in the paper but now use categorial distributions instead of bernoulli or the multinomial conditional probability distributions.

Then it will not be spars data matrix Since categorical data is dense data For matrix cell we have some not zero categorical value?