Closed m-mohsin-zafar closed 4 years ago
Same doubt !
@m-mohsin-zafar thank you for sharing mlxtend code! Have you tried pystacknet, vecstack and scikit - learn
@m-mohsin-zafar mlens documentation is seriously very confusing and not up to the mark
Hi there,
Thanks for reaching out!
We can achieve what you're looking for using dict
s to specify pipelines when adding a layer to the ensemble:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from mlens.ensemble import SuperLearner
from mlens.preprocessing import Subset
iris = load_iris()
X = iris.data
y = iris.target
ens = SuperLearner()
ens.add(estimators={"pipe-1": [LogisticRegression()],
"pipe-2": [LogisticRegression()]},
preprocessing={"pipe-1": [Subset([0, 2])],
"pipe-2": [Subset([1, 2, 3])]})
ens.add_meta(LogisticRegression())
ens.fit(X, y)
The key to note is that the values in these dicts should be list
s:
ests = {pipe_1: [est_1, est_2, ...], pipe_2: [est_1, est_2, ...]}
prps = {pipe_1: [trans_1, trans_2, ...], pipe_2: [trans_2, trans_2, ...]}
So if we feed an input X
to this layer, it will get processed in parallel through pipe_1
and pipe_2
. In each of these, we obtain preprocessed features X -> trans_1 - > trans_2 - > X_[1,2]
that we feed to the list of estimators in that pipeline. The output of a layer is the concatenation of all predictions:
P = [pipe_1_est_1(X_1), pipe_1_est_2(X_1), ..., pipe_2_est_1(X_2), pipe_2_est_2(X_2), ...]
Note that you can also propagate features from the input array X
to the output array P
by using the propagate_features
argument when adding a layer to the ensemble:
ens.add(estimators=ests, preprocessing=prep, propagate_features=[0, 1, 2])
The reason for using this logic is that it allows us to run a preprocessing pipeline just once and then have many estimators using those features. The sklearn version would require us to re-run the preprocessing step for every estimator, which isn't efficient.
Having said that, you can mix and match between mlxtend, mlens, and scklearn:
from sklearn.datasets import load_iris
from mlxtend.feature_selection import ColumnSelector
from sklearn.linear_model import LogisticRegression
from mlens.ensemble import SuperLearner
from sklearn.pipeline import make_pipeline
iris = load_iris()
X = iris.data
y = iris.target
pipe1 = make_pipeline(ColumnSelector(cols=(0, 2)),
LogisticRegression())
pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)),
LogisticRegression())
ens = SuperLearner()
ens.add([pipe1, pipe2])
ens.add_meta(LogisticRegression())
ens.fit(X, y)
Hope this helps! Feel free to reopen this issue otherwise : )
I have a dataset with say 200 features. What I want is to give 30 features to 1 classifier, 90 to another and 80 to another in one layer of ensembled clfs and then take their outputs to and give them to a meta classifier. I believe this is achievable via Subset Class available in your library but can't figure the right way. I have found a similar way in another library 'mlxtend' code of which is available below. However, I'd lie to do my work via your library. Thanking you in anticipation.