Hierarchical modeling (fitting to predictions of already-fit estimators)

PeterDSteinberg commented 7 years ago

See this prototype for hierarchical modeling added by PR #192 . The idea for hierarchical modeling described in the Phase II proposal was essentially taking a group of already-fit estimators as an argument to a new second layer estimator whose fit method involves calling the predict method of each of the already-fit estimators, concatenating their predictions along the column (feature) dimension, and using that feature matrix as the input to the fit method of the second layer estimator.

Initialization with a second layer estimator and group of already fit estimators:

class MultiLayer(SklearnMixin, BaseEstimator):
    def __init__(self, estimator, estimators=None):
        self.estimator = estimator
        self.estimators = estimators

A concatenation utility function that is called before most methods of the estimator (second layer estimator's fit, predict, decision_function, or other methods)

    def _concat_features(self, X, y=None, **kw):
        X, y, row_idx = self._as_numpy_arrs(X, y)
        predicts = (getattr(est, 'predict') for est in self.estimators)
        preds = [pred(X) for pred in predicts]
        X2 = np.array(preds).T
        return X2, y

A few TODO items I know of:

Consider cases where the estimators return:
- Categorical y
- Continuous y
- y that differ in row count from estimator to estimator - raise an exception in the _concat_features function. (We have to have the same shape y returned by the predict method of each of the already-fit estimators)
- Test that it does not matter if the estimators are heterogeneous in structure/parameters as long as all of them return the same shape y
- What if I want to concatenate the predictions of N supervised classifiers or clustering estimators (or pipelines with classifier or clusterer as final step), then I may want to run a LabelBinarizer step to encode the 2nd layer feature matrix as a binary one (with corresponding expansion of the column dimension and possibly a sparse representation). How do I do a 2nd layer Pipeline like that? My first thought on this is that the MultiLayer class from the snippets above, would have a subclass (?) where the fit_transform method returns the concatenated estimators' predictions, then any of the usual Pipeline steps could be used thereafter.
Any limitations regarding parallelism - (sending the fitted estimators via dask.distributed usage in EaSearchCV or related code)
This MultiLayer for the _concat_features function should use elm.pipeline.predict_many (parallel prediction with dask.distributed).

PeterDSteinberg commented 7 years ago

Also consider whether we may want to have each of the estimators do a prediction from different/same input feature matrices (https://github.com/ContinuumIO/elm/issues/180).

PeterDSteinberg commented 7 years ago

Also note in the documentation that is part of answering the general question in #198

"How do I hyperparameterize model structures (not just different model parameters)?"

In some cases, if hyperparameterization across different structure choices turns out infeasible, then maybe an alternate similar idea is using this MultiLayer idea above, where estimators of MultiLayer have different structures, and rather than automatically choosing a best structure(s) just predicting from all using a second layer models and inferring from a second layer estimator.

ContinuumIO / elm

Hierarchical modeling (fitting to predictions of already-fit estimators) #207