Why should I trust you?: Explaining the predictions of any classifier

Why should I trust you?: Explaining the predictions of any classifier Despite widespread adoption, machine learning models re- main mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one.

In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

Bibtex:

@inproceedings{Ribeiro:2016:WIT:2939672.2939778, author = {Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos}, title = {"Why Should I Trust You?": Explaining the Predictions of Any Classifier}, booktitle = {Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, series = {KDD '16}, year = {2016}, isbn = {978-1-4503-4232-2}, location = {San Francisco, California, USA}, pages = {1135--1144}, numpages = {10}, url = {http://doi.acm.org/10.1145/2939672.2939778}, doi = {10.1145/2939672.2939778}, acmid = {2939778}, publisher = {ACM} }

From previous review: Several groups have developed methods for identifying and visualizing the features in individual test data points that contribute the most towards a classifier’s output (i.e. local explanation). Perhaps the most well-known method is Ribeiro et al.’s (2016) Local Interpretable Model-Agnostic Explanations (LIME), an algorithm that provides explanations of decisions for any machine learning model. The LIME algorithm outputs a binary vector representing the input: each bit corresponds to an input feature (e.g. a word in a document, or a contiguous region – super-pixel – in an image), with ones indicating that the feature was important for the classifier’s output, and a zero indicating it was unimportant. It calculates the importance of each feature by generating perturbed samples of the input point and using these samples (labeled by the original model) to learn a local approximation to the model. LIME can be particularly helpful in identifying confusing input features, allowing for dataset debugging or improved feature engineering. LIME works with any model, and was tested on a CNN by Ribeiro et al. (2016). However, its sampling approach means it can be too slow for interactive use with complex models.

dais-ita / interpretability-papers

Why should I trust you?: Explaining the predictions of any classifier #41