suspicious results by individual_surrogate_model

harell commented 5 years ago

I'm using the Titanic dataset to explain a random passenger survival rate. To do that I fit a GLM model with the following statistics, ordered by p-value:

term	estimate	std.error	statistic	p.value
GENDERmale	-2.66	0.20	-13.33	0.00
(Intercept)	4.70	0.48	9.76	0.00
CLASS3rd	-2.50	0.32	-7.91	0.00
AGE	-0.04	0.01	-5.57	0.00
CLASS2nd	-1.51	0.31	-4.86	0.00
SIBSP	-0.38	0.11	-3.47	0.00
EMBARKEDSouthampton	-0.66	0.23	-2.91	0.00
EMBARKEDQueenstown	-1.08	0.38	-2.88	0.00
FARE	0.00	0.00	-0.43	0.67
PARCH	0.01	0.11	0.10	0.92

We can see that GENDER is the most important variable. However, plotting localModel::individual_surrogate_model output doesn't show this variable is important.

Here are some comparisons with other explainers:

ceterisParibus::ceteris_paribus
breakDown::broken

We see that the other two methods capture and report GENDER impact.

I think the following print gives a direction for a potential bug. This is the individual_surrogate_model output print. Notice the NA near GENDER

Does that make any sense?

mstaniak commented 5 years ago

Thank you very much for pointing this out, I will look into this. Looks like a serious bug, but I need to check.

mstaniak commented 5 years ago

I wasn't able to reproduce this error so far. Two questions:

are you using the titatnic dataset from DALEX package?
if not, are there any missing values in your data frame?

I encountered a different problem ('male' and 'female' were merged while they weren't supposed to be), but not this one.

harell commented 5 years ago

I'll create a reproducible example next week.

harell commented 5 years ago

Attached is the dataset Attached is the code. Change line 7 to point at the dataset

Notice the inconsistency between the print output and plot information.

reproducible-bug.zip

mstaniak commented 5 years ago

Thanks, I located the source of the problem. Interpretable features were extracted from the decision rules using regular expressions and with the levels "female" and "male" the "male" pattern was found in two rules instead of one. Once I solve this problem, I will fix this issue.

mstaniak commented 5 years ago

I fixed this problem in the refactoring branch. When I finish all the changes, I will merge to master.

ModelOriented / localModel

suspicious results by individual_surrogate_model #21