Closed ispmarin closed 4 years ago
Hi there!
The problem here is that make_step
takes a class to produce another, but you are passing it an instance of Pipeline.
You could make a step from the Pipeline class, but with baikal you don't need that class as there is an idiomatic way of pipelining a linear sequence of steps. If I understood your snippet correctly here's how I'd do it:
CountVectorizer = make_step(sklearn.feature_extraction.text.CountVectorizer)
TfidfTransformer = make_step(sklearn.feature_extraction.text.CountVectorizer.TfidfTransformer)
MultinomialNB = make_step(sklearn.naive_bayes.MultinomialNB)
RandomForestClassifier = make_step(sklearn.ensemble.RandomForestClassifier)
LinearSVC = make_step(sklearn.svm.LinearSVC)
classifiers = (MultinomialNB, RandomForestClassifier, LinearSVC)
x = Input()
y_t = Input()
classfier_outs = []
for classifier in classifiers:
# Instead of using Pipeline class, do:
z = CountVectorizer()(x)
z = TfidfTransformer()(z)
z = classifier()(z, y_t)
classfier_outs.append(z)
ensemble_features = Concatenate()(classifier_outs)
y = LinearSVC()(ensemble_features, y_t)
model = Model(x, y, y_t)
X_train, X_test, y_train, y_test = train_test_split(dfi[feat_var], dfi.binary_label)
model.fit(X_train, y_train)
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
print(classification_report(y_test, y_test_pred))
Thanks! Make total sense now.
I´m trying to use baikal to create a stacked model for text features, so I created a pipeline with CountVectorizer and TfidfTransformer and passed the pipeline to
make_step
:But I´m getting the following error:
Any ideas on how to solve this? Thanks