Ibotta / pure-predict

Machine learning prediction in pure Python
Apache License 2.0
86 stars 7 forks source link

Allow sparse input for naive bayes classifier #18

Open cyan198 opened 3 years ago

cyan198 commented 3 years ago

I tried converting pipeline to pure_sklearn. The pipeline consist of TfidfVectorizer and MultinomialNB. The output of TfIdfVectorizer is sparse array as input to MultinomialNB. However, the naive bayes predict method does not support sparse array as input (X), as defined in the code below and thus throws error.

https://github.com/Ibotta/pure-predict/blob/c3431b79af4df9794c9f99246fa359a6c72a10ee/pure_sklearn/naive_bayes.py#L25

Possible solution I'm not sure why the code above is necessary to reject sparse input. However I tried changing to allow sparse and tested it. I don't encounter any issue as the estimator works as expected.

X = check_array(X, handle_sparse="allow")

Is this the right way?

I've created a test method under test_pipeline to test this scenario. I can submit a PR if you want to review.

My dev environment: Package Version


fasttext 0.9.2 numpy 1.21.4 pandas 1.3.4 pure-predict 0.0.4 pytest 6.2.5 scikit-learn 1.0.1 scipy 1.7.2