I tried converting pipeline to pure_sklearn. The pipeline consist of TfidfVectorizer and MultinomialNB. The output of TfIdfVectorizer is sparse array as input to MultinomialNB. However, the naive bayes predict method does not support sparse array as input (X), as defined in the code below and thus throws error.
Possible solution
I'm not sure why the code above is necessary to reject sparse input. However I tried changing to allow sparse and tested it. I don't encounter any issue as the estimator works as expected.
X = check_array(X, handle_sparse="allow")
Is this the right way?
I've created a test method under test_pipeline to test this scenario. I can submit a PR if you want to review.
I tried converting pipeline to pure_sklearn. The pipeline consist of TfidfVectorizer and MultinomialNB. The output of TfIdfVectorizer is sparse array as input to MultinomialNB. However, the naive bayes predict method does not support sparse array as input (X), as defined in the code below and thus throws error.
https://github.com/Ibotta/pure-predict/blob/c3431b79af4df9794c9f99246fa359a6c72a10ee/pure_sklearn/naive_bayes.py#L25
Possible solution I'm not sure why the code above is necessary to reject sparse input. However I tried changing to allow sparse and tested it. I don't encounter any issue as the estimator works as expected.
X = check_array(X, handle_sparse="allow")
Is this the right way?
I've created a test method under test_pipeline to test this scenario. I can submit a PR if you want to review.
My dev environment: Package Version
fasttext 0.9.2 numpy 1.21.4 pandas 1.3.4 pure-predict 0.0.4 pytest 6.2.5 scikit-learn 1.0.1 scipy 1.7.2