Ibotta / pure-predict

Machine learning prediction in pure Python
Apache License 2.0
86 stars 7 forks source link

Error when predict with converted model built with CountVectorizer(binary=True) #19

Open phongvis opened 2 years ago

phongvis commented 2 years ago

Describe the bug An error is raised when making an inference with a converted sklearn model built with CountVectorizer(binary=True). It's ok if binary=False

To Reproduce

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from pure_sklearn.map import convert_estimator

vectorizer = CountVectorizer(binary=True)
model = LogisticRegression(random_state=0)
pipeline = Pipeline([
    ('vect', vectorizer),
    ('clf', model)
])

X_train = ['one text', 'two text', 'three text']
y_train = ['1', '2', '3']
pipeline.fit(X_train, y_train)
converted = convert_estimator(pipeline)
converted.predict(['four'])

It's ok if a vectorizer is created with binary=False.

Expected behavior There shouldn't be any errors.

Additional context Add any other context about the problem here.