MarcelRobeer / ContrastiveExplanation

Contrastive Explanation (Foil Trees), developed at TNO/Utrecht University
BSD 3-Clause "New" or "Revised" License
44 stars 6 forks source link

Not working for multi-valued categorical features #2

Closed raam93 closed 5 years ago

raam93 commented 5 years ago

Does the current implementation support only binary-valued categorical features?

Because I tried with the adult income dataset which has many multi-value categorical and continuous features (https://archive.ics.uci.edu/ml/datasets/adult) and got output like these:

"The model predicted '<=50k' instead of '>50k' because 'hours_per_week <= 42.832 and not occupation and age <= 34.95 and not education and hours_per_week <= 57.892'"

Here, education and occupation are not binary features - they have many levels.

MarcelRobeer commented 5 years ago

I have added initial support for multi-valued categorical features (mapping to a one-hot encoder and back in DomainMapperTabular). Typically this is already done as it is required by the predictor, so could you please indicate which package you are using to directly get predictions for categorical features?

raam93 commented 5 years ago

Thanks for the reply!

I worked with your updated library - now I get outputs like this:

"The model predicted '>50k' instead of '<=50k' because '34 > 0.954 and 59 <= 4993.447 and 79 <= 0.046'"

So I remove 'fnlwgt' and 'education-num' features from adult income data and label encode the data and feed to your library.

df = pd.read_csv('adult_income.csv')
del df['fnlwgt']
del df['education-num']
df_le, label_encoder = label_encode(df, discrete) # discrete is discrete feature names
X = df_le.loc[:, df_le.columns != class_name].values # class_name is 'class'
y = df_le[class_name].values

'X' looks like this:

array([[39, 6, 9, ..., 0, 40, 38], [50, 5, 9, ..., 0, 13, 38], [38, 3, 11, ..., 0, 40, 38], ..., [58, 3, 11, ..., 0, 40, 38], [22, 3, 11, ..., 0, 20, 38], [52, 4, 11, ..., 0, 40, 38]], dtype=int64)

Then, after training I follow your code:

sample = x_test[17]

# Create a domain mapper (map the explanation to meaningful labels for explanation)
dm = ce.domain_mappers.DomainMapperTabular(x_train,
                                           feature_names=np.array(['age',
                                                         'workclass',
                                                         'education',
                                                         'marital-status',
                                                         'occupation',
                                                         'relationship',
                                                         'race',
                                                         'sex',
                                                         'capital-gain',
                                                         'capital-loss',
                                                         'hours-per-week',
                                                         'native-country']),
                                           contrast_names=np.array(['<=50k', '>50k']),
                                           categorical_features=np.array([1,2,3,4,5,6,7,11]))

# Create the contrastive explanation object (default is a Foil Tree explanator)
exp = ce.ContrastiveExplanation(dm)

# Explain the instance (sample) for the given model
exp.explain_instance_domain(model.predict_proba, sample)

Can you try your code on the adult income dataset or any other dataset with multi-valued categorical features? Thanks in advance!

MarcelRobeer commented 5 years ago

I added your case as example number 2 to the example notebook.