flennerhag / mlens

ML-Ensemble – high performance ensemble learning
http://ml-ensemble.com
MIT License
846 stars 108 forks source link

Accessing predictions of nth layer #108

Closed anurags25 closed 6 years ago

anurags25 commented 6 years ago

Is there any way to access the predictions of an particular layer of an ensemble? I couldn't find one in the docs.

flennerhag commented 6 years ago

Yes, you can see the predictions of a layer by specifying the return_preds argument in the fit or predict call. And to get multiple layers at a time, specify a list of layer indices.

anurags25 commented 6 years ago

`from mlens.ensemble import SuperLearner from sklearn.metrics import accuracy_score from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC

ensemble = SuperLearner(scorer=accuracy_score, random_state=42, verbose=2) ensemble.add([RandomForestClassifier(random_state=42), SVC()]) ensemble.add_meta(LogisticRegression())

ensemble.fit(X_train, y_train)

preds = ensemble.predict(X_val, return_preds=True) print ('Shape', preds.shape)

Output:

Fitting 2 layers Processing layer-1 done | 00:00:00 Processing layer-2 done | 00:00:00 Fit complete | 00:00:00

Predicting 2 layers Processing layer-1 done | 00:00:00 Processing layer-2 done | 00:00:00 Predict complete | 00:00:00 Shape (191,) ' Shouldn't the shape of 'preds' be something like (n_samples, n_layers) instead of just (n_samples,)?

flennerhag commented 6 years ago

First, if you set return_preds=True, you get the predictions of the final layer. If you want predictions from more layers, you need to pass a list with the names of the layers you want predictions from.

Second, if you want class probabilities, you must specify proba=True in the add method. For instance,

from mlens.ensemble import SuperLearner
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
import numpy as np

X_train = np.random.rand(20, 5)
y_train = np.random.randint(0, 5, size=(20,))

learners = [RandomForestClassifier(n_estimators=10), SVC(gamma='scale', probability=True)]
meta_learner = LogisticRegression(multi_class='auto', solver='lbfgs')

ensemble = SuperLearner()
ensemble.add(learners, proba=True)
ensemble.add_meta(meta_learner)

# Default names are layer-1,2,3,...
get_preds = ['layer-1', 'layer-2']

preds = ensemble.fit(X_train, y_train, return_preds=get_preds)

for p in preds:
    print ('Shape', p.shape)

This produces

Shape (20, 10)
Shape (20, 1)

Hope this clarifies. Just holler otherwise!