Using StackingRegressorDF on pipelines containing a ColumnTransformerDF raises an error on .fit.
Using a StackingRegressorDF as the last part of a PipelineDF works as expected. But creating multiple PipelineDF objects with ColumnTransformerDF and then stacking these fails with the following error:
TypeError: StackingRegressorDF.fit: ColumnTransformerDF.fit_transform: arg y must be None, or a pandas Series or DataFrame
Root cause
Most likely the reason is this line in StackingRegressor.fit:
y = column_or_1d(y, warn=True)
Reproduceable example:
from sklearndf.pipeline import PipelineDF
from sklearndf.regression import LinearRegressionDF, ElasticNetDF
from sklearndf.transformation import ColumnTransformerDF, StandardScalerDF
from sklearndf.regression import StackingRegressorDF
import pandas as pd
import numpy as np
# toy data set
np.random.seed(1)
data = pd.DataFrame({
'x1': np.random.uniform(size=(10,)),
'x2': np.random.uniform(size=(10,)),
'y': np.random.uniform(size=(10,)),
})
# basic building blocks
model1 = LinearRegressionDF()
model2 = ElasticNetDF()
preprocessing = ColumnTransformerDF([
('x1', StandardScalerDF(), ['x1']),
('x2', 'passthrough', ['x1']),
])
# Pipeline with stack works
pipeline = PipelineDF([
('preprocessing', preprocessing),
('stack', StackingRegressorDF([
('model1', model1),
('model2', model2),
]))
])
pipeline.fit(data, data['y'])
print(pipeline.predict(data))
# Stack of Pipelines doesn't
stack_of_pipelines = StackingRegressorDF([
('pipeline1', PipelineDF([
('preprocessing', preprocessing),
('model1', model1)
])),
('pipeline2', PipelineDF([
('preprocessing', preprocessing),
('model2', model1)
]))
])
stack_of_pipelines.fit(data, data['y'])
Summary:
Using
StackingRegressorDF
on pipelines containing aColumnTransformerDF
raises an error on.fit
.Using a
StackingRegressorDF
as the last part of aPipelineDF
works as expected. But creating multiplePipelineDF
objects withColumnTransformerDF
and then stacking these fails with the following error:Root cause
Most likely the reason is this line in
StackingRegressor.fit
:Reproduceable example: