flennerhag / mlens

ML-Ensemble – high performance ensemble learning
http://ml-ensemble.com
MIT License
843 stars 108 forks source link

No valid parameters found for superlearner #113

Closed cliffrunner closed 5 years ago

cliffrunner commented 5 years ago

Hello, I am learning mlens for a project. I am just modifying the tutorial example for practice purpose. In the following code, I am trying to solve a regression problem. X and y are correct and verified by other base learners.

base_learners = [('lasso',Lasso()),
             ('ridge',Ridge())]

meta_learners = [('rf', RandomForestRegressor(random_state=seed))]

learner = SuperLearner(random_state=seed) \
.add(base_learners) \
.add(meta_learners, meta=True)

params = {'lasso': {'alpha': uniform(0.2, 1)},
      'ridge': {'alpha': uniform(0.05, 0.5)},
      'rf': {'n_estimators': randint(10, 100)}
      }

scorer = make_scorer(r2_score)
evaluator = Evaluator(scorer=scorer, random_state=seed, cv=5, verbose=2)

evaluator.fit(X, y, learner, params, preprocessing=None, n_iter=10)

when I run the code above, I got this error message:

No valid parameters found for superlearner. Will fit and score once with given parameter settings.

Could you please let me know what I did wrong?

Thanks.

flennerhag commented 5 years ago

Hi,

Thanks for reaching out!

It looks like a small change, but you have actually changed the setup significantly. In the tutorial, we take the base layer of the ensemble as a given feature extractor (no tuning), and only tune classifiers on top of those features. Here, you tune the ensemble itself: that's a difference class instance.

Thus, the model you are tuning (and thus the keys you need in your param dict) are not lasso, ridge or rf, but the superlearner instance.

To fix this, grab the Sequential instance underlying the SuperLearner wrapper:


learner = SuperLearner().add(base_learners).add_meta(meta_learners)

seq = learner._backend   # The Sequential class underlying your learner

params = {'seq':
  {'layer-1__group-0__lasso__estimator__alpha': uniform(0.2, 1),
   'layer-1__group-0__ridge__estimator__alpha': uniform(0.05, 0.5),
   'layer-2__group-1__rf__estimator__n_estimators':  randint(10, 100),
  }
}

scorer = make_scorer(r2_score)
evaluator = Evaluator(scorer=scorer, random_state=seed, cv=5, verbose=2)
evaluator.fit(X, y, [('seq', seq)], params, preprocessing=None, n_iter=10)

More generally, the keys of your parameters of estimators in the ensemble can be found by:

seq.get_params().keys()

Note that this will be expensive: you're doing cross-validation in the ensemble for each cv-fold in the evaluator.

Hope that helps!