flennerhag / mlens

ML-Ensemble – high performance ensemble learning
http://ml-ensemble.com
MIT License
843 stars 108 forks source link

Possibility to fit models with different datasets #124

Closed lorenzoFabbri closed 4 years ago

lorenzoFabbri commented 5 years ago

Suppose I have the same set of variables measured by multiple (different) instruments, or in different conditions (if I have 3 conditions, I'll have 3 datasets with different values for the variables/columns and the same rows/samples). Is there a way to fit one model for each dataset and then combine the probabilities in order to have one prediction for each sample?

flennerhag commented 4 years ago

Yes that's easy to do with column selection. For instance, if you have 3 variables, we can concatenate the three datasets into a [n, 9] dimensional input array. We then create an ensemble that fits a predictor per each set of 3 features:

from sklearn.linear_model import LogisticRegression
from mlens.ensemble import SuperLearner
from mlens.preprocessing import Subset

ens = SuperLearner()
ens.add(estimators={"pipe-1": [LogisticRegression()],
                    "pipe-2": [LogisticRegression()],
                    "pipe-3": [LogisticRegression()]},
        preprocessing={"pipe-1": [Subset([0, 1, 2])],
                       "pipe-2": [Subset([3, 4, 5])],
                       "pipe-3": [Subset([6, 7, 8])]})
ens.add_meta(LogisticRegression())

Hope that helps!