dais-ita / interpretability-papers

Papers on interpretable deep learning, for review
29 stars 2 forks source link

Layer-wise relevance propagation for neural networks with local renormalization layers #44

Open richardtomsett opened 6 years ago

richardtomsett commented 6 years ago

Layer-wise relevance propagation for neural networks with local renormalization layers Layer-wise relevance propagation is a framework which allows to decompose the prediction of a deep neural network computed over a sample, e.g. an image, down to relevance scores for the single input dimensions of the sample such as subpixels of an image. While this approach can be applied directly to generalized linear mappings, product type non-linearities are not covered. This paper proposes an approach to extend layer-wise relevance propagation to neural networks with local renormalization layers, which is a very common product-type non-linearity in convolutional neural networks. We evaluate the proposed method for local renormalization layers on the CIFAR-10, Imagenet and MIT Places datasets.

Bibtex:

@Inbook{Binder2016, author="Binder, Alexander and Montavon, Gr{\'e}goire and Lapuschkin, Sebastian and M{\"u}ller, Klaus-Robert and Samek, Wojciech", editor="Villa, Alessandro E.P. and Masulli, Paolo and Pons Rivero, Antonio Javier", title="Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers", bookTitle="Artificial Neural Networks and Machine Learning -- ICANN 2016: 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II", year="2016", publisher="Springer International Publishing", address="Cham", pages="63--71", isbn="978-3-319-44781-0", doi="10.1007/978-3-319-44781-0_8" }

richardtomsett commented 6 years ago

From previous review: The recently-proposed layer-wise relevance propagation (LRP) algorithm from Wojciech Samek’s group (Binder et al. 2016a, Binder et al. 2016b) uses the fact that the individual neural network units are differentiable to decompose the network output in terms of its input variables. It is a principled method that has a close relationship to Taylor decomposition and is applicable to arbitrary deep neural network architectures (Montavon et al. 2017). The output is a heatmap over the input features that indicates the relevance of each feature to the model output. This makes the method particularly well suited to analyzing image classifiers, though the method has also been adapted for text and electroencephalogram signal classification (Sturm et al. 2016). Samek et al. (2017) have also developed an objective metric for comparing the output of LRP with similar heatmapping algorithms.

*Binder et al. 2016a: issue #44, Binder et al. 2016b: issue #45 , Montavon et al. 2017: issue #46, Sturm et al. 2016: issue #47, Samek et al. 2017: issue #48.