MarcelRobeer / ContrastiveExplanation

Contrastive Explanation (Foil Trees), developed at TNO/Utrecht University
BSD 3-Clause "New" or "Revised" License
45 stars 5 forks source link

Xgboost Support #3

Closed jrinvictus closed 4 years ago

jrinvictus commented 4 years ago

Does this package support Xgboost?

MarcelRobeer commented 4 years ago

Yes it supports both the XGBoostClassifier (use .predict_proba() instead of .predict() as the input for explain_instance_domain()) and the XGBoostRegressor.

Minimal working example:

# Load data set (IRIS)
from sklearn.datasets import load_iris
data = load_iris()
X, y = data.data, data.target

# Create a classifier on this data set
import xgboost as xgb
model = xgb.XGBClassifier().fit(X, y)

# Perform contrastive explanation
import contrastive_explanation as ce

dm = ce.domain_mappers.DomainMapperTabular(X,
                                           feature_names=data.feature_names,
                                           contrast_names=data.target_names)
exp = ce.ContrastiveExplanation(dm)

exp.explain_instance_domain(model.predict_proba,
                            X[0])

[OUT] "The model predicted 'setosa' instead of 'versicolor' because 'petal length (cm) <= 2.462 and petal width (cm) <= 1.752'"

jrinvictus commented 4 years ago

Thanks so much for the response and code example.

firmai commented 4 years ago

What about lightgbm::

import contrastive_explanation as ce
## Foil Tree: Explain using a decision tree
## Contrastive 
## The question is what is the smallest amount of steps she can take for the predition
## to be healthy instead of bankrupt

# Select a sample to explain ('questioned data point') why it predicted the fact instead of the foil 
sample = test.loc[row.index,:]  ## same as row but numeric values
print(sample)

c_contrasts  = train[y].unique()

# Create a domain mapper for the Pandas DataFrame (it will automatically infer feature names)
c_dm = ce.domain_mappers.DomainMapperPandas(train,
                                            contrast_names=c_contrasts)

# Create the contrastive explanation object (default is a Foil Tree explanator)
c_exp = ce.ContrastiveExplanation(c_dm)

# Explain the instance (sample) for the given model
c_exp.explain_instance_domain(model.predict, sample.values)
row

Gives the following error:

TypeError                                 Traceback (most recent call last)

<ipython-input-159-a0135d71967e> in <module>()
     19 
     20 # Explain the instance (sample) for the given model
---> 21 c_exp.explain_instance_domain(model.predict, sample.values)
     22 row

4 frames

/content/ContrastiveExplanation/contrastive_explanation/contrastive_explanation.py in explain_instance_domain(self, *args, **kwargs)
    279         """
    280         return self.domain_mapper.explain(*self.explain_instance(*args,
--> 281                                                                  **kwargs))

/content/ContrastiveExplanation/contrastive_explanation/contrastive_explanation.py in explain_instance(self, model_predict, fact_sample, foil, foil_method, foil_strategy, generate_data, n_samples, include_factual, epsilon, **kwargs)
    186             fact, foil = self.fact_foil.get_fact_foil(model_predict,
    187                                                       fact_sample,
--> 188                                                       foil_method=foil_method)
    189 
    190         # Generate neighborhood data

/content/ContrastiveExplanation/contrastive_explanation/fact_foil.py in get_fact_foil(self, model, sample, foil_method)
     84 
     85         self.fact, self.foil = self._get_fact_foil_impl(model, sample,
---> 86                                                         foil_method)
     87 
     88         if self.verbose:

/content/ContrastiveExplanation/contrastive_explanation/fact_foil.py in _get_fact_foil_impl(self, model_predict, sample, foil_method)
    125     def _get_fact_foil_impl(self, model_predict, sample,
    126                             foil_method=default_method):
--> 127         pred, fact = self._pred(model_predict, sample)
    128         foil = self.get_foil(pred, foil_method)
    129 

/content/ContrastiveExplanation/contrastive_explanation/fact_foil.py in _pred(model_predict, sample, pred_has_max)
     33         if pred_has_max:
     34             _pred = pred
---> 35             if len(_pred) == 1:
     36                 _pred = np.array([pred[0], 1 - pred[0]])
     37             return pred, np.argmax(_pred)

TypeError: object of type 'numpy.float64' has no len()
MarcelRobeer commented 4 years ago

As it cannot see what type of machine learning problem your model tackles (regression or classification), it assumes classification by default. Your model outputs a single value as you are using the .predict method in the explainer (in this case predicting a numpy float). For classification, use the .predict_proba method instead.