Evaluating the Visualization of What a Deep Neural Network Has Learned

Evaluating the Visualization of What a Deep Neural Network Has Learned Deep neural networks (DNNs) have demonstrated impressive performance in complex machine learning tasks such as image classification or speech recognition. However, due to their multilayer nonlinear structure, they are not transparent, i.e., it is hard to grasp what makes them arrive at a particular classification or recognition decision, given a new unseen data sample. Recently, several approaches have been proposed enabling one to understand and interpret the reasoning embodied in a DNN for a single test image. These methods quantify the “importance” of individual pixels with respect to the classification decision and allow a visualization in terms of a heatmap in pixel/input space. While the usefulness of heatmaps can be judged subjectively by a human, an objective quality measure is missing. In this paper, we present a general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps. We compare heatmaps computed by three different methods on the SUN397, ILSVRC2012, and MIT Places data sets. Our main result is that the recently proposed layer-wise relevance propagation algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method. We provide theoretical arguments to explain this result and discuss its practical implications. Finally, we investigate the use of heatmaps for unsupervised assessment of the neural network performance.

Bibtex:

@ARTICLE{7552539, author={W. Samek and A. Binder and G. Montavon and S. Lapuschkin and K. R. Müller}, journal={IEEE Transactions on Neural Networks and Learning Systems}, title={Evaluating the Visualization of What a Deep Neural Network Has Learned}, year={2017}, volume={28}, number={11}, pages={2660-2673}, doi={10.1109/TNNLS.2016.2599820}, ISSN={2162-237X}, }

From previous review: The recently-proposed layer-wise relevance propagation (LRP) algorithm from Wojciech Samek’s group (Binder et al. 2016a, Binder et al. 2016b) uses the fact that the individual neural network units are differentiable to decompose the network output in terms of its input variables. It is a principled method that has a close relationship to Taylor decomposition and is applicable to arbitrary deep neural network architectures (Montavon et al. 2017). The output is a heatmap over the input features that indicates the relevance of each feature to the model output. This makes the method particularly well suited to analyzing image classifiers, though the method has also been adapted for text and electroencephalogram signal classification (Sturm et al. 2016). Samek et al. (2017) have also developed an objective metric for comparing the output of LRP with similar heatmapping algorithms.

*Binder et al. 2016a: issue #44, Binder et al. 2016b: issue #45 , Montavon et al. 2017: issue #46, Sturm et al. 2016: issue #47, Samek et al. 2017: issue #48.

dais-ita / interpretability-papers

Evaluating the Visualization of What a Deep Neural Network Has Learned #48