Explaining nonlinear classification decisions with deep taylor decomposition

Explaining nonlinear classification decisions with deep taylor decomposition Nonlinear methods such as Deep Neural Networks (DNNs) are the gold standard for various challenging machine learning problems such as image recognition. Although these methods perform impressively well, they have a significant disadvantage, the lack of transparency, limiting the interpretability of the solution and thus the scope of application in practice. Especially DNNs act as black boxes due to their multilayer nonlinear structure. In this paper we introduce a novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements. Although our focus is on image classification, the method is applicable to a broad set of input data, learning tasks and network architectures. Our method called deep Taylor decomposition efficiently utilizes the structure of the network by backpropagating the explanations from the output to the input layer. We evaluate the proposed method empirically on the MNIST and ILSVRC data sets.

Bibtex:

@article{Montavon:2017:ENC:3051179.3051284, author = {Montavon, Gr{\'e}goire and Lapuschkin, Sebastian and Binder, Alexander and Samek, Wojciech and M\"{u}ller, Klaus-Robert}, title = {Explaining Nonlinear Classification Decisions with Deep Taylor Decomposition}, journal = {Pattern Recogn.}, issue_date = {May 2017}, volume = {65}, number = {C}, month = may, year = {2017}, issn = {0031-3203}, pages = {211--222}, numpages = {12}, url = {https://doi.org/10.1016/j.patcog.2016.11.008}, doi = {10.1016/j.patcog.2016.11.008}, acmid = {3051284}, publisher = {Elsevier Science Inc.} }

From previous review: The recently-proposed layer-wise relevance propagation (LRP) algorithm from Wojciech Samek’s group (Binder et al. 2016a, Binder et al. 2016b) uses the fact that the individual neural network units are differentiable to decompose the network output in terms of its input variables. It is a principled method that has a close relationship to Taylor decomposition and is applicable to arbitrary deep neural network architectures (Montavon et al. 2017). The output is a heatmap over the input features that indicates the relevance of each feature to the model output. This makes the method particularly well suited to analyzing image classifiers, though the method has also been adapted for text and electroencephalogram signal classification (Sturm et al. 2016). Samek et al. (2017) have also developed an objective metric for comparing the output of LRP with similar heatmapping algorithms.

*Binder et al. 2016a: issue #44, Binder et al. 2016b: issue #45 , Montavon et al. 2017: issue #46, Sturm et al. 2016: issue #47, Samek et al. 2017: issue #48.

dais-ita / interpretability-papers

Explaining nonlinear classification decisions with deep taylor decomposition #46