alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
772 stars 86 forks source link

Raise error for prediction explanations of ensemble methods #1863

Closed freddyaboulton closed 3 years ago

freddyaboulton commented 3 years ago

Passing a stacked ensemble pipeline into our prediction explanations function will result in a stack trace

from evalml.demos import load_fraud
from evalml.pipelines import BinaryClassificationPipeline, StackedEnsembleClassifier
from evalml.pipelines.utils import make_pipeline_from_components
from evalml.model_understanding import explain_predictions

X, y = load_fraud(1000)

class RFPipeline(BinaryClassificationPipeline):
    component_graph = ["Imputer", "DateTime Featurization Component", "One Hot Encoder", "Random Forest Classifier"]

class LogisticPipeline(BinaryClassificationPipeline):
    component_graph = ["Imputer", "DateTime Featurization Component", "One Hot Encoder", "Logistic Regression Classifier"]

class LgbmPipeline(BinaryClassificationPipeline):
    component_graph = ["Imputer", "DateTime Featurization Component", "One Hot Encoder", "LightGBM Classifier"]

p1 = RFPipeline({})
p2 = LogisticPipeline({})
p3 = LgbmPipeline({})

ensemble_pipeline = make_pipeline_from_components([StackedEnsembleClassifier([p1, p2, p3])],
                                                  problem_type="binary",
                                                  random_seed=0,
                                                  custom_name="Ensemble Pipeline")

ensemble_pipeline.fit(X, y)

explain_predictions(ensemble_pipeline, X, y, indices_to_explain=[100, 150])
-------------------------------------------------------------------
TypeError                         Traceback (most recent call last)
<ipython-input-7-6b4265438f4b> in <module>
----> 1 explain_predictions(ensemble_pipeline, X, y, indices_to_explain=[100, 150])

~/sources/evalml/evalml/model_understanding/prediction_explanations/explainers.py in explain_predictions(pipeline, input_features, y, indices_to_explain, top_k_features, include_shap_values, output_format)
    101                                              output_format=output_format, top_k_features=top_k_features,
    102                                              include_shap_values=include_shap_values)
--> 103     return report_creator(data)
    104 
    105 

~/sources/evalml/evalml/model_understanding/prediction_explanations/_user_interface.py in make_text(self, data)
    476             else:
    477                 report.extend([""])
--> 478             report.extend(self.table_maker.make_text(index, data.pipeline, data.pipeline_features))
    479         return "".join(report)
    480 

~/sources/evalml/evalml/model_understanding/prediction_explanations/_user_interface.py in make_text(self, index, pipeline, pipeline_features)
    411             y (pd.Series):
    412         """
--> 413         table = _make_single_prediction_shap_table(pipeline, pipeline_features,
    414                                                    index_to_explain=index,
    415                                                    top_k=self.top_k_features,

~/sources/evalml/evalml/model_understanding/prediction_explanations/_user_interface.py in _make_single_prediction_shap_table(pipeline, pipeline_features, index_to_explain, top_k, include_shap_values, output_format)
    226     if pipeline_features_row.isna().any(axis=None):
    227         raise ValueError(f"Requested index ({index_to_explain}) produces NaN in features.")
--> 228     shap_values = _compute_shap_values(pipeline, pipeline_features_row, training_data=pipeline_features.dropna(axis=0))
    229     normalized_shap_values = _normalize_shap_values(shap_values)
    230 

~/sources/evalml/evalml/model_understanding/prediction_explanations/_algorithms.py in _compute_shap_values(pipeline, features, training_data)
     54     # Catboost can naturally handle string-encoded categorical features so we don't need to convert to numeric.
     55     if estimator.model_family != ModelFamily.CATBOOST:
---> 56         features = check_array(features.values)
     57 
     58     if estimator.model_family.is_tree_estimator():

~/miniconda3/envs/evalml/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~/miniconda3/envs/evalml/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    614                     array = array.astype(dtype, casting="unsafe", copy=False)
    615                 else:
--> 616                     array = np.asarray(array, order=order, dtype=dtype)
    617             except ComplexWarning as complex_warning:
    618                 raise ValueError("Complex data not supported\n"

~/miniconda3/envs/evalml/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

TypeError: float() argument must be a string or a number, not 'Timestamp'

This is trickier than just getting rid of the check_array calls in _compute_shap_values.

Shap will convert the data you pass it into it's own DenseData format, which is basically a numpy array.

Since the StackedEnsembleClassifier component has pipelines inside, we need the dataframe/datatable representation for the internal pipeline components to work properly.

AFAIK, shap can really only handle sklearn/catboost/lgbm/xgboost estimators and numpy arrays. When we first designed this, we got around this limitation by extracting the features for the final estimator and passing the _component_obj (not the evalml component) to the shap algorithm.

However, this doesn't really work for our ensembles.

dsherry commented 3 years ago

@freddyaboulton I think we should simply raise a known exception when a user tries to generate prediction explanations for an ensemble. Does that sound good to you?

freddyaboulton commented 3 years ago

@dsherry That sounds good to me!

I think it would be cool if we could eventually do prediction explanations for ensembles but it looks like that's a heavy lift given what we can do with shap at the moment. I think that might be a good project for a future blitz 😄 hehe