Closed BillmanH closed 4 years ago
@wamartin-aml @imatiach-msft
@BillmanH sorry about the trouble you are having. To clarify, are you using the mimic explainer or the linear explainer? If using a regression model and using the linear explainer, I wouldn't expect to see this. However, the mimic explainer (the default one used by AutoML) trains a global surrogate model for the output of the original or teacher model, so this could be an artifact of the regularization parameters in the surrogate LightGBM model.
@BillmanH maybe I am misunderstanding the issue though, I'm not sure why you are expecting the feature importance values to always be non-zero. Could you provide a sample notebook or dataset with this issue? If not, maybe a bit more context on why the feature importance values should be non-zero would help clarify the issue more.
@imatiach-msft , you may have discovered it. I'll look into this and update if this turns out to be the issue.
Thanks @imatiach-msft , I guess I assumed that the mimic explainer was just a naming convention and not a type of explainer. Honestly I think I assumed that because the type of model was declared in autoML that it would not be necessary to declare it again in the explanation process.
Looking at: https://docs.microsoft.com/en-us/python/api/azureml-explain-model/azureml.explain.model?view=azure-ml-py
I can see that there are several types of explainers. We went through several and found that the AzureML explainers return both expect a sepcific type of model. The Model object retrieved from AutoML appears to be the wrong model type.
Running:
explainer = LinearExplainer(fitted_model, X )
Returns
Exception: An unknown model type was passed:
I noticed that the mimic explainer has a specific mimicwrapper that is used. Is there a similar linearwrapper that is required as well?
Bottom line, I want the explanation for a regression model. I feel like the older approach was way easier than this. Is there a document that will walk you through the differences between setting up the linear model and the mimic (default) process?
@BillmanH azureml-explain-model is the old deprecated package, it was split into interpret-community which is open-sourced (https://github.com/interpretml/interpret-community) and azureml-interpret. Interpret-community is an extension to interpret (https://github.com/interpretml/interpret) which was written by MSR and mainly focuses on EBM, their glassbox model.
If the model object from automl contains pre-processing steps then it can't be used with SHAP's linear explainer, since it's a greybox explainer, or model-specific explainer (mimic explainer is a blackbox explainer, similar to SHAP's KernelExplainer) - I'm guessing that is the issue you are running into. If you can share the code I can take a look at it, or maybe we can discuss via a teams meeting. I would like to understand what fitted_model is, specifically if it's a pipeline and what it contains.
Could you explain what you mean by the older approach? Do you mean the TabularExplainer which is a composition of multiple shap-based methods and finds the best explainer for the given model?
Thanks @imatiach-msft , this is clearly what the issue is. I think you've shown us that we need to go through a lot more documentation to get to what we want but you've put us on the right track. I'm sure it will be worth the extra effort.
using:
azureml-sdk[automl,explain,notebooks]==1.0.85
Repro steps: When running a regression model explanation,
raw_explanation.get_feature_importance_dict()
low values drop off.I plotted out the shap values here to show that they drop of at a very low point.
The values are higher than zero because ... I know this. However it seems to take everything below a certain point and drop it to
0.0
even though the next highest values is0.002
per above. I can't find any documentation about the truncation.This also makes it look like there is a sudden drop between the explainability of one feature and the next (albeit minor).
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.