MAIF / shapash

🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
https://maif.github.io/shapash/
Apache License 2.0
2.73k stars 334 forks source link

How is the y-axis (Shap Interaction value) in the generated plot calculated? #477

Closed AirFin closed 12 months ago

AirFin commented 1 year ago

Thank you for the work of shapash, it is a great Python package!

I have a question when using xpl.plot.interactions_plot: How is the y-axis (Shap Interaction value) in the generated plot calculated?

I noticed that it is not the value of the feature represented by the x-axis or its Shap value. It is also not the value of the feature represented by the color or its Shap value.

I would greatly appreciate your response. Thanks♪(・ω・)ノ

ThomasBouche commented 1 year ago

Hi, Thanks,

The shap interaction value is compute by shap. if you want to know more, I recommend this notebook and this book:

AirFin commented 1 year ago

Hi, Thanks, 你好谢谢,

The shap interaction value is compute by shap.shap 交互值由 shap 计算。 if you want to know more, I recommend this notebook and this book: 如果你想了解更多,我推荐这个笔记本和这本书:

Thank you for your response.

One more question, how can I export the Shap Interaction value between every 2 features into a dataframe?

ThomasBouche commented 1 year ago

if you have compile xpl.plot.top_interactions_plot(), you have interactions values in this attributes of the SmartExplainer:

xpl.interaction_values
AirFin commented 1 year ago

if you have compile xpl.plot.top_interactions_plot(), you have interactions values in this attributes of the SmartExplainer:

xpl.interaction_values

Thank U very much!!!

AirFin commented 1 year ago

@ThomasBouche

Thank you for your response, I have encountered a problem that has confused me: the y-axis values of the plot generated by xpl.interactions_plot are always half of the y-axis values of shap.dependence_plot

For example, I am using the classic Boston dataset as an example.

Below is a snippet of my code.

First, I imported the necessary packages and data, and performed some basic machine learning.

# package
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
import shap

# ----------
# dataset
from sklearn.datasets import load_boston
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target

# ----------
# ML
X = df.drop('MEDV', axis=1)
y = df['MEDV']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
model = XGBRegressor()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Next, I used shap in Python to analyze the interactions.

shap_interaction_values = shap.TreeExplainer(model).shap_interaction_values(X_test)

I use shap.dependence_plot to analyze the interaction between two variables.

shap.dependence_plot(('INDUS', 'ZN'), shap_interaction_values, X_test)

image

shap.dependence_plot(('AGE', 'TAX'), shap_interaction_values, X_test)

image

Then, I use shapash in Python for interaction analysis.

from shapash import SmartExplainer

xpl = SmartExplainer(model=model)

xpl.compile(x=X_test)

I use xpl.interactions_plot to analyze the interaction of 2 variables.

xpl.plot.interactions_plot('INDUS', 'ZN')

image

xpl.plot.interactions_plot('AGE', 'TAX')

image


Now, let's compare the results.

image

I was surprised to find that the distribution of the scatter points and the x-axis are the same when comparing the plots generated by xpl.interactions_plot and shap.dependence_plot.

However, the values on the y-axis are different. the y-axis values of the plot generated by xpl.interactions_plot are always half of the y-axis values of shap.dependence_plot

I really don't understand why this is happening. Maybe there's an issue with my code? I would greatly appreciate a response, thank you!

ThomasBouche commented 1 year ago

Hi, It's a good question, and the problem is with Shapash. Shapash doesn't multiply the matrix interactions by 2 to account for interactions between 2 variables.

image

In this example, the value displayed is -0.245, when it should be double.

I will create an issue to correct this

AirFin commented 1 year ago

thank you for your working!

---Original--- From: @.> Date: Thu, Aug 24, 2023 16:57 PM To: @.>; Cc: @.**@.>; Subject: Re: [MAIF/shapash] How is the y-axis (Shap Interaction value) in thegenerated plot calculated? (Issue #477)

Hi, It's a good question, and the problem is with Shapash. Shapash doesn't multiply the matrix interactions by 2 to account for interactions between 2 variables.

In this example, the value displayed is -0.245, when it should be double.

I will create an issue to correct this

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>