Closed sug4ndh closed 4 years ago
For sklearn
models (and any black-box model in general) the explainers expect the predict_fn
to return class probabilities, so just changing the line to predict_fn=lambda x: clf.predict_proba(x)
should work. Note that for keras
models the .predict
method always returns class probabilities, but for sklearn
you need to specify predict_proba
.
Hi Janis,
Thanks for replying! I have noticed a problem, predict_proba
works for SVM but for Random forest, even though there are no errors anymore but when I print the explanation, it looks like this:
{'orig_proba': array([[0.9, 0.1]]), 'orig_class': 0, 'cf': None, 'all': []}
I am using the Boston housing dataset as in the example.
Hmm, 'cf' : None
means that no counterfactual examples could be found using the explainer settings.
I believe this is actually to do with how RandomForestClassifier
implements predict_proba
:
The predicted class probabilities of an input sample are computed as
the mean predicted class probabilities of the trees in the forest. The
class probability of a single tree is the fraction of samples of the same
class in a leaf.
So it's not a true probability, in fact the probability outputs will change only when the perturbed instance is classified as being a different class. As this is unlikely to happen with small perturbations (and especially for examples for which the original class probability is as high as 90%), I suspect the algorithm gets no signal from the model as the predictions are always the same.
Because our algorithms rely on getting class probabilities, this then poses challenges with tree-based classifiers for which class probabilities are not continuous. One solution would be to increase the perturbation size, e.g. by increasing the eps
argument. I tested with eps=(0.05, 0.05)
which does return counterfactual instances. Bear in mind that increasing the perturbations will lead to counterfactual instances that are further away from the original example, but this is necessary to get any probability signal from tree based classifiers on instances which are well classified (as in your example with 90% probability).
@arnaudvl @alexcoca this is something we need to discuss.
Thanks for the explanation, that was very helpful!
Hi,
I am trying to use Counterfactuals guided by Prototypes. The examples given with the documentation use keras models in which the target vector is always converted to a matrix. But, sklearn models like svm or Random forest expect the target (y_train) as a vector. If I use target (y_train) as a vector with CounterFactualProto, I get an "IndexError: tuple index out of range".
How do I go about using CounterFactualProto with these models?