flennerhag / mlens

ML-Ensemble – high performance ensemble learning
http://ml-ensemble.com
MIT License
843 stars 108 forks source link

Modifying estimator/learner after fitting #115

Closed CirdanCapital closed 5 years ago

CirdanCapital commented 5 years ago

Hi,

First of all, let me say that I really like your library: thank you very much for building it. I am trying to modify an individual estimator/learner (LinearRegression) after calling the 'fit' method. Specifically I have a 3-layer, SuperLearner and in the second layer I am using the following code: ########################################

2ND LAYER

# Build the second layer (potentially add more
ests_2 =     [
        (‘gbr’, GradientBoostingRegressor()),
        (‘rfr’, RandomForestRegressor()),
        (‘lrg’, LinearRegression()),
        (‘lrr’, LinearRegression()),
        (‘mlp’, MLPRegressor()),
        (‘knn’, KNeighborsRegressor()),
        (‘xgb’, XGBRegressor()),
        (‘ada’, AdaBoostRegressor())
            ]
pars_2_1 = {‘random_state’: seed}
pars_2_2 = {‘max_depth’: 15}
prms_2 = {‘gbr’:pars_2_1,
          ‘rfr’:pars_2_2}
ensemble.add(ests_2,
             folds=10,
             proba=False)
CirdanCapital commented 5 years ago

In the Ensemble I am doing this: ########################################

ENSEMBLE

    ensemble.fit(X_train, y_train, estimators=estimators, param_dicts=params, scorer=long_only_scorer)

where estimators is just: estimators = {‘ests_1’: ests_1, ‘ests_2’: ests_2, ‘ests_3’: ests_3}

and after fitting, I am trying to change the 'lrr' linear regression like this:

########################################

ENSEMBLE

    ensemble.fit(X_train, y_train, estimators=estimators, param_dicts=params, scorer=long_only_scorer)

########################################

CHANGE CUSTOM REGRESSION HERE

    #

    print(‘ensemble.layers[1].learners[3]:\n’,ensemble.layers[1].learners[3])
    lr = ensemble.layers[1].learners[3].estimator
    print(‘ensemble.layers[1].learners[3].estimator:\n’, ensemble.layers[1].learners[3].estimator)
    new_X_train = X_train[:,0,:]
    lr.fit(new_X_train, y_train)
    params = lr.get_params()
    coef = lr.coef_
    intercept = lr.intercept_
    new_coef = np.zeros(len(coef)*1000)
    if len(coef) == 247:
        new_coef[-1] = 1.0
    else:
        new_coef[-3] = 1.0
    print(‘params:\n’, params)
    print(‘coef:\n’, coef, ‘len(coef):\n’, len(coef), ‘type(coef):\n’, type(coef))
    print(‘new_coef:\n’, new_coef, ‘len(new_coef):\n’, len(new_coef), ‘type(new_coef):\n’, type(new_coef))
    print(‘intercept:\n’, intercept)
    new_intercept = 0.0
    print(‘new_intercept:\n’, new_intercept)
    ensemble.layers[1].learners[3].estimator.coef_ = new_coef
    ensemble.layers[1].learners[3].estimator.intercept_ = new_intercept
    ensemble.layers[1].learners[3].estimator = lr
CirdanCapital commented 5 years ago

This works, but I don't think that I have actually changed the learner used in the predict method. How can I check? Thank you in advance for your reply.

CirdanCapital commented 5 years ago

Meaning that in the code above I made the coef_ length ridiculously long and yet the ensemble is still working (see results below). In addition to the problem above, I would also really like to be able to see what is the input and output shape for each layer if possible (just to check). Thank you again in advance for your reply. Predicting 3 layers Processing layer-1 Processing layer-2 done | 00:00:00 Processing layer-3 done | 00:00:00 Predict complete | 00:00:03 Ensemble r_2 score: 0.9660554415614002 Ensemble MSE score: 0.032161018919918904 Fit data: score-m score-s ft-m ft-s pt-m pt-s layer-1 ker 0.43 0.04 6.69 0.74 0.37 0.08 layer-2 ada 0.17 0.03 0.04 0.01 0.00 0.00 layer-2 gbr 0.15 0.03 0.09 0.03 0.00 0.00 layer-2 knn 0.16 0.02 0.00 0.00 0.00 0.00 layer-2 lrg 0.32 0.02 0.00 0.00 0.00 0.00 layer-2 lrr 0.32 0.02 0.00 0.00 0.00 0.00 layer-2 mlp 0.28 0.02 1.50 0.56 0.00 0.00 layer-2 rfr 0.17 0.02 0.04 0.00 0.00 0.00 layer-2 xgb 0.15 0.03 0.10 0.01 0.00 0.00

time elapsed (ms): 178307.00000 time elapsed (s): 178.30700

flennerhag commented 5 years ago

Hi, glad you like it !

In-place modification on fitted instances is not trivial. Recall that you don't have one estimator, but one clone of the estimator per cv-fold plus one clone for the full training set. Thus, the estimator attribute you are changing has no effect on the fitted clones (it would take effect if you we're to refit the ensemble).

To make an in-place change, you'd have to access the fitted instances. You can do this via the Learner.learner attribute (returns an iterator): in your code that would be

for fitted_estimator in ensemble.layers[i].learners[j].learner:
   # make your change

If you additionally want to change the fitted clones on the cv-folds:

for fitted_estimator in ensemble.layers[i].learners[j].sub_learner:
   # make your change

Hope that helps!

flennerhag commented 5 years ago

And to see layer-wise predictions, pass a list of layer names to the return_preds argument in the fit / predict methods (see docs or #108).

CirdanCapital commented 5 years ago

Hi Flennerhag,

Thanks for your replies. I really tried to implement your two answers but couldn't get a handle on the fitted regressions (see attached picture) using

for fitted_estimator in ensemble.layers[i].learners[j].sub_learner:

make your change

and using this I couldn't see the regression's coef_ value

for fitted_estimator in ensemble.layers[i].learners[j].learner:

make your change

trappingfittedlr
CirdanCapital commented 5 years ago

Lastly even if I put return_preds=[1,2,3] or return_preds = True it doesn't work.

flennerhag commented 5 years ago

Hi think you solved the learner problem?

Return preds expects a list of layer names, i.e. ['layer-1', 'layer-2', 'layer-3']