ModelOriented / localModel

LIME-like explanations with interpretable features based on Ceteris Paribus curves. Now on CRAN.
https://modeloriented.github.io/localModel
14 stars 3 forks source link

suspicious results by individual_surrogate_model #21

Closed harell closed 5 years ago

harell commented 5 years ago

I'm using the Titanic dataset to explain a random passenger survival rate. To do that I fit a GLM model with the following statistics, ordered by p-value:

term estimate std.error statistic p.value
GENDERmale -2.66 0.20 -13.33 0.00
(Intercept) 4.70 0.48 9.76 0.00
CLASS3rd -2.50 0.32 -7.91 0.00
AGE -0.04 0.01 -5.57 0.00
CLASS2nd -1.51 0.31 -4.86 0.00
SIBSP -0.38 0.11 -3.47 0.00
EMBARKEDSouthampton -0.66 0.23 -2.91 0.00
EMBARKEDQueenstown -1.08 0.38 -2.88 0.00
FARE 0.00 0.00 -0.43 0.67
PARCH 0.01 0.11 0.10 0.92

We can see that GENDER is the most important variable. However, plotting localModel::individual_surrogate_model output doesn't show this variable is important.

image

Here are some comparisons with other explainers:

  1. ceterisParibus::ceteris_paribus
  2. breakDown::broken

image

image

We see that the other two methods capture and report GENDER impact.

I think the following print gives a direction for a potential bug. This is the individual_surrogate_model output print. Notice the NA near GENDER

image

Does that make any sense?

mstaniak commented 5 years ago

Thank you very much for pointing this out, I will look into this. Looks like a serious bug, but I need to check.

mstaniak commented 5 years ago

I wasn't able to reproduce this error so far. Two questions:

I encountered a different problem ('male' and 'female' were merged while they weren't supposed to be), but not this one.

harell commented 5 years ago

I'll create a reproducible example next week.

harell commented 5 years ago

Attached is the dataset Attached is the code. Change line 7 to point at the dataset

Notice the inconsistency between the print output and plot information.

reproducible-bug.zip

mstaniak commented 5 years ago

Thanks, I located the source of the problem. Interpretable features were extracted from the decision rules using regular expressions and with the levels "female" and "male" the "male" pattern was found in two rules instead of one. Once I solve this problem, I will fix this issue.

mstaniak commented 5 years ago

I fixed this problem in the refactoring branch. When I finish all the changes, I will merge to master.