biolab / orange3-explain

Explainable AI add-on for Orange3
GNU General Public License v3.0
6 stars 5 forks source link

Problem Displaying Model Explanation and Prediction Explanation #26

Closed sdfungayi closed 1 year ago

sdfungayi commented 3 years ago

I am running a churn prediction workflow as shown in the attached image. My class of interest is YES

  1. The "Explain Model" widget displays only a vertical strip when I set the class to YES but displays model explanation graphic properly when I set the class to NO
  2. Similarly, the "Explain Prediction" widget displays only a vertical line when I set the class to YES but displays prediction explanation graphic properly when I set the class to NO

What am I doing wrong?

How do I get make the graphics display properly when the class is set to YES?

Workflow Explain Prediction - No Explain Prediction - Yes Explain Model - No Explain Model - Yes

ajdapretnar commented 3 years ago

Considering you are using Data Sampler, your sample is probably skewed (not enough instances with the target value YES). You can inspect this in the Box Plot.

sdfungayi commented 3 years ago

The ratio NO:YES is 75:25 I tried all sampling types in Data Sampler, and also selected the option "Stratify sample"

The workflow is here https://drive.google.com/file/d/1rOagjcaRj7YAv3V6L9lQMei1VBBY1Pl5/view?usp=sharing The dataset (WAFn-UseC-Telco-Customer-Churn.csv) is available here: https://www.kaggle.com/blastchar/telco-customer-churn

I tried again with a different dataset with ratio 65:35 attached herein and got a similar result. customer-churn-data_NM.xlsx

PrimozGodec commented 3 years ago

@sdfungayi thank you for reporting the issue. It is definitely a bug in the software or underlying library. The explanation for class YES should be the opposite of the explanation for class NO. It is always that way when the class has only two values.

I will dig deeper into this issue but first I will transfer the issue to the orange3-explain repository which implements widgets.

I noticed that the issue appears in the case with Gradient Boosting but does not appear with other models. While we are solving the issue you can try to explain predictions of Logistic regression for example.

sdfungayi commented 3 years ago

Thank you @PrimozGodec

VesnaT commented 2 years ago

This was a problem in the SHAP library and should have been fixed with the newest version.

ajdapretnar commented 2 years ago

Thanks for looking into it. But I still have an issue with shap==0.40.0. Use heart_disease.tab with Gradient Boosting.

Screen Shot 2021-12-17 at 12 04 42

VesnaT commented 2 years ago

I see. I only checked it for the Iris dataset.

PrimozGodec commented 1 year ago

Closing via https://github.com/biolab/orange3-explain/pull/50