AutoMLSearch: build API to access IDs of ensemble pipeline's input pipelines

Background Ensemble models compute predictions from a group of input models, then apply a learning algorithm to combine those predictions into a prediction which is more accurate overall.

Each of the stacked ensembler pipelines built by our automl is constructed by grabbing a few pipelines from the automl leaderboard, and then building a graph where each of the input pipelines' predictions are provided as inputs to the stacked ensembler component. The resulting graph contains a full copy of each of the input pipelines.

Proposal Add an API to our automl search which allows users to look up the IDs of the pipelines used by a particular ensembler pipeline.

This would allow users to dig further into the details of each input pipeline if they want to understand the dynamics of their stacked ensembler better.

An initial thought at the API design is below

automl = AutoMLSearch(...)
automl.search(X, y)
# let's say we look at the rankings and see that an ensembler has ID 42
ensembler = automl.get_pipeline(42)
# we can now use this pipeline to compute predictions, scores, stats, model understanding, etc.
ensembler.fit(X, y)
ensembler.predict(X)
...

# and in addition to grabbing the ensembler pipeline in full, we can grab the list of IDs of the input pipelines
input_pipeline_ids = automl.get_ensembler_input_pipelines(42)
for pipeline_id in input_pipeline_ids:
    pipeline = automl.get_pipeline(pipeline_id)
    pipeline.fit(X, y)
    print(pipeline.feature_importance)
    ...

with pytest.raises(Exception):
    automl.get_ensembler_input_pipelines(7) # raise exception if pipeline isn't an ensembler

@christopherbunn FYI

alteryx / evalml

AutoMLSearch: build API to access IDs of ensemble pipeline's input pipelines #3008