ContinuumIO / elm

Phase I & part of Phase II of NASA SBIR - Parallel Machine Learning on Satellite Data
http://ensemble-learning-models.readthedocs.io
44 stars 23 forks source link

Hierarchical modeling (fitting to predictions of already-fit estimators) #207

Open PeterDSteinberg opened 7 years ago

PeterDSteinberg commented 7 years ago

See this prototype for hierarchical modeling added by PR #192 . The idea for hierarchical modeling described in the Phase II proposal was essentially taking a group of already-fit estimators as an argument to a new second layer estimator whose fit method involves calling the predict method of each of the already-fit estimators, concatenating their predictions along the column (feature) dimension, and using that feature matrix as the input to the fit method of the second layer estimator.

Initialization with a second layer estimator and group of already fit estimators:

class MultiLayer(SklearnMixin, BaseEstimator):
    def __init__(self, estimator, estimators=None):
        self.estimator = estimator
        self.estimators = estimators

A concatenation utility function that is called before most methods of the estimator (second layer estimator's fit, predict, decision_function, or other methods)

    def _concat_features(self, X, y=None, **kw):
        X, y, row_idx = self._as_numpy_arrs(X, y)
        predicts = (getattr(est, 'predict') for est in self.estimators)
        preds = [pred(X) for pred in predicts]
        X2 = np.array(preds).T
        return X2, y

A few TODO items I know of:

PeterDSteinberg commented 7 years ago

Also consider whether we may want to have each of the estimators do a prediction from different/same input feature matrices (https://github.com/ContinuumIO/elm/issues/180).

PeterDSteinberg commented 7 years ago

Also note in the documentation that is part of answering the general question in #198

"How do I hyperparameterize model structures (not just different model parameters)?"

In some cases, if hyperparameterization across different structure choices turns out infeasible, then maybe an alternate similar idea is using this MultiLayer idea above, where estimators of MultiLayer have different structures, and rather than automatically choosing a best structure(s) just predicting from all using a second layer models and inferring from a second layer estimator.