BCG-X-Official / facet

Human-explainable AI.
https://bcg-x-official.github.io/facet
Apache License 2.0
502 stars 46 forks source link

How to calculate SRI for nonlinear models? #380

Open jckkvs opened 5 months ago

jckkvs commented 5 months ago

@mtsokol

https://github.com/BCG-X-Official/facet/issues/374 related.

Thank you. I have modified your code and considered non-linear models such as KernelRidge.

However, KernelRidge is naturally not compatible with TreeExplainerFactory, so I considered using KernelExplainerFactory or ExactExplainerFactory. However, since ExactExplainerFactory is not usable depending on the size of the dataset, I adopted KernelExplainerFactory(shap_interaction=True).

In this case, a RuntimeError occurs. RuntimeError: SHAP interaction values have not been calculated. Create an inspector with parameter 'shap_interaction=True' to enable calculations involving SHAP interaction values.

Checking your implementation, it seems that KernelExplainerFactory does not compute shap_interaction. https://github.com/BCG-X-Official/facet/blob/66bea1574e7a05e8db13cc25b5f071a260d0f66b/src/facet/explanation/_explanation.py#L377 https://github.com/BCG-X-Official/facet/blob/66bea1574e7a05e8db13cc25b5f071a260d0f66b/src/facet/inspection/_learner_inspector.py#L139

I have two questions. 1. For non-linear models, is it necessary to use ExactExplainerFactory and perform inspector.fit()? What should I do if the data size is large? 2. The specification that KernelExplainerFactory internally converts shap_interaction=True to False is confusing. Would it be better to throw an error if shap_interaction=True is specified, or change it so that the shap_interaction argument cannot be specified at all?

import pandas as pd
from sklearn.model_selection import RepeatedKFold, GridSearchCV

# some helpful imports from sklearndf
from sklearndf.pipeline import RegressorPipelineDF
from sklearndf.regression import RandomForestRegressorDF

# relevant FACET imports
from facet.data import Sample
from facet.selection import LearnerSelector, ParameterSpace

from sklearn.datasets import load_diabetes
X,y = load_diabetes(return_X_y=True)
data = load_diabetes()

X = pd.DataFrame(X)
X.columns  = data["feature_names"]
y = pd.DataFrame(y)
y.columns = ["target"]
diabetes_df = pd.concat([X,y], axis=1)

# create FACET sample object
diabetes_sample = Sample(observations=diabetes_df, target_name="target")

# create a (trivial) pipeline for a random forest regressor

from sklearn.kernel_ridge import KernelRidge
model = KernelRidge()
model.fit(X,y)

# fit the model inspector
from facet.inspection import NativeLearnerInspector
inspector = NativeLearnerInspector(
    model=model,
    explainer_factory=KernelExplainerFactory(),
    n_jobs=-3,
    shap_interaction=True

)
inspector.fit(diabetes_sample)

# visualise synergy as a matrix
from pytools.viz.matrix import MatrixDrawer
synergy_matrix = inspector.feature_synergy_matrix()

# visualise redundancy as a matrix
redundancy_matrix = inspector.feature_redundancy_matrix()
# visualise redundancy using a dendrogram
import matplotlib
from pytools.viz.dendrogram import DendrogramDrawer
redundancy = inspector.feature_redundancy_linkage()
j-ittner commented 1 week ago

Sorry @jckkvs this slipped through

Re (1) totally agree an exception would be better - we will make that change Re (2) FACET relies on the shap package for all SHAP calculations. I agree it would be great to see support for interaction values for a broader set of models - probably best to raise that to the shap maintainers. Alternatively we may consider adding our own interaction explainer to a future version of FACET (in our own work, we find that ensemble models work great so we're fine using the TreeExplainer)

mtsokol commented 1 week ago

That's right, for SHAP interaction values ​​I believe you are mostly limited to tree based models.