中文 | English

logo

InterpretDL: Interpretation of Deep Learning Models based on PaddlePaddle

InterpretDL, short for interpretations of deep learning models, is a model interpretation toolkit for PaddlePaddle models. This toolkit contains implementations of many interpretation algorithms, including LIME, Grad-CAM, Integrated Gradients and more. Some SOTA and new interpretation algorithms are also implemented.

InterpretDL is under active construction and all contributions are welcome!

Why InterpretDL

The increasingly complicated deep learning models make it impossible for people to understand their internal workings. Interpretability of black-box models has become the research focus of many talented researchers. InterpretDL provides a collection of both classical and new algorithms for interpreting models.

By utilizing these helpful methods, people can better understand why models work and why they don't, thus contributing to the model development process.

For researchers working on designing new interpretation algorithms, InterpretDL gives an easy access to existing methods that they can compare their work with.

:fire: :fire: :fire: News :fire: :fire: :fire:

(2024.6) A paper investigating feature attributions of LLMs via Optimal Transport got accepted by ICML'24. See implementations at GiLOT.

Xuhong Li, Jiamin Chen, Yekun Chai, Haoyi Xiong. GILOT: Interpreting Generative Language Models via Optimal Transport. ICML 2024. paper link.

(2023.10) M4 XAI Benchmark got accepted by Neurips'23 Datasets and Benchmarks. See implementations at M4_XAI_Benchmark.

Xuhong Li, Mengnan Du, Jiamin Chen, Yekun Chai, Himabindu Lakkaraju, Haoyi Xiong. “M4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities and Models.” Neurips 2023, Dataset and Benchmark Track. paper link.

(2023.2) One paper on explaining Transformers got accepted by TMLR. See implementations at BT.

Jiamin Chen, Xuhong Li, Lei Yu, Dejing Dou, Haoyi Xiong. “Beyond Intuition: Rethinking Token Attributions inside Transformers.” TMLR. paper link.

Demo

Interpretation algorithms give a hint of why a black-box model makes its decision.

The following table gives visualizations of several interpretation algorithms applied to the original image to tell us why the model predicts "bull_mastiff."

Original Image	IntGrad (demo)	SG (demo)	LIME (demo)	Grad-CAM (demo)

For sentiment analysis task, the reason why a model gives positive/negative predictions can be visualized as follows. A quick demo can be found here. Samples in Chinese are also available here.

InterpretDL: Interpretation of Deep Learning Models based on PaddlePaddle
Why InterpretDL
:fire: :fire: :fire: News :fire: :fire: :fire:
Demo
Contents
Installation
Documentation
Getting Started
Examples and Tutorials
Roadmap
- Implemented Algorithms with Taxonomy
- Implemented Trustworthiness Evaluation Algorithms
Presentations
References of Algorithms
Copyright and License
Recent News

Installation

It requires the deep learning framework paddlepaddle, versions with CUDA support are recommended.

Pip installation

pip install interpretdl

# or with tsinghua mirror
pip install interpretdl -i https://pypi.tuna.tsinghua.edu.cn/simple

Developer installation

git clone https://github.com/PaddlePaddle/InterpretDL.git
# ... fix bugs or add new features
cd InterpretDL && pip install -e .
# welcome to propose pull request and contribute
yapf -i <python_file_path>  # code style: column_limit=120

Unit Tests

# run gradcam unit tests
python -m unittest -v tests.interpreter.test_gradcam
# run all unit tests
python -m unittest -v

Documentation

Online link: interpretdl.readthedocs.io.

Or generate the docs locally:

git clone https://github.com/PaddlePaddle/InterpretDL.git
cd docs
make html
open _build/html/index.html

Getting Started

All interpreters inherit the abstract class Interpreter, of which interpret(**kwargs) is the function to call.

# an example of SmoothGradient Interpreter.

import interpretdl as it
from paddle.vision.models import resnet50
paddle_model = resnet50(pretrained=True)
sg = it.SmoothGradInterpreter(paddle_model, use_cuda=True)
gradients = sg.interpret("test.jpg", visual=True, save_path=None)

A quick Getting-Started tutorial (or on NBviewer) is provided. It takes only a few minutes to be familiar with InterpretDL.

Examples and Tutorials

We have provided at least one example for each interpretation algorithm and each trustworthiness evaluation algorithm, hopefully covering applications for both CV and NLP.

We are currently preparing tutorials for easy usages of InterpretDL.

Both examples and tutorials can be accessed under tutorials folder.

Roadmap

We are planning to create a useful toolkit for offering the model interpretations as well as evaluations. We have now implemented the interpretation algorithms as follows, and we are planning to add more algorithms that are desired. Welcome to contribute or just tell us which algorithms are desired.

Implemented Algorithms with Taxonomy

Two dimensions (representations of explanation results and types of the target model) are used to categorize the interpretation algorithms. This taxonomy can be an indicator to find the best suitable algorithm for the target task and model.

Methods	Representation	Model Type
LIME	Input Features	Model-Agnostic
LIME with Prior	Input Features	Model-Agnostic
GLIME	Input Features	Model-Agnostic
NormLIME/FastNormLIME	Input Features	Model-Agnostic
LRP	Input Features	Differentiable*
SmoothGrad	Input Features	Differentiable
IntGrad	Input Features	Differentiable
GradSHAP	Input Features	Differentiable
Occlusion	Input Features	Model-Agnostic
GradCAM/CAM	Intermediate Features	Specific: CNNs
ScoreCAM	Intermediate Features	Specific: CNNs
Rollout	Intermediate Features	Specific: Transformers
TAM	Input Features	Specific: Transformers
Generic Attention	Input Features	Specific: Transformers
Bidirectional	Input Features	Specific: Transformers
ForgettingEvents	Dataset-Level	Differentiable
TIDY (Training Data Analyzer)	Dataset-Level	Differentiable
BHDF	Dataset-Level**	Differentiable
Consensus	Features	Cross-Model

* LRP requires that the model is of specific implementations for relevance back-propagation.

** Dataset-Level Interpreters require a training process.

Implemented Trustworthiness Evaluation Algorithms

Interpretability Evaluation Algorithms

[x] Localization Ability

Planning Alorithms

Intermediate Features Interpretation Algorithm
- [x] More Transformers Specific Interpreters
Dataset-Level Interpretation Algorithms
- [ ] Influence Function
Evaluations
- [ ] Local Fidelity
- [ ] Monotonicity
- [x] Infidelity
- [ ] Sensitivity

Presentations

Linux Foundation Project AI & Data -- Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond. Video Link (00:20:30 -- 00:45:00).

Baidu Create 2021 (in Chinese): Video Link (01:18:40 -- 01:36:30).

ICML 2021 Expo -- Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond. Video Link.

References of Algorithms

SGDNoise: On the Noisy Gradient Descent that Generalizes as SGD, Wu et al 2019
IntegratedGraients: Axiomatic Attribution for Deep Networks, Mukund Sundararajan et al. 2017
CAM, GradCAM: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, Ramprasaath R. Selvaraju et al. 2017
SmoothGrad: SmoothGrad: removing noise by adding noise, Daniel Smilkov et al. 2017
GradientShap: A Unified Approach to Interpreting Model Predictions, Scott M. Lundberg et al. 2017
Occlusion: Visualizing and Understanding Convolutional Networks, Matthew D Zeiler and Rob Fergus 2013
Lime: "Why Should I Trust You?": Explaining the Predictions of Any Classifier, Marco Tulio Ribeiro et al. 2016
NormLime: NormLime: A New Feature Importance Metric for Explaining Deep Neural Networks, Isaac Ahern et al. 2019
ScoreCAM: Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks, Haofan Wang et al. 2020
ForgettingEvents: An Empirical Study of Example Forgetting during Deep Neural Network Learning, Mariya Toneva et al. 2019
LRP: On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation, Bach et al. 2015
Rollout: Quantifying Attention Flow in Transformers, Abnar et al. 2020
TAM: Explaining Information Flow Inside Vision Transformers Using Markov Chain. Yuan et al. 2021
Consensus: Cross-Model Consensus of Explanations and Beyond for Image Classification Models: An Empirical Study. Li et al 2021
Perturbation: Evaluating the visualization of what a deep neural network has learned.
Deletion&Insertion: RISE: Randomized Input Sampling for Explanation of Black-box Models.
PointGame: Top-down Neural Attention by Excitation Backprop.
Generic Attention: Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

Copyright and License

InterpretDL is provided under the Apache-2.0 license.

Recent News

(2022.11) Two papers got accepted by AAAI'23 and Artificial Intelligence respectively. See implementations at G-LIME and TrainingDynamics.

Xuhong Li, Haoyi Xiong, Xingjian Li, Xiao Zhang, Ji Liu, Haiyan Jiang, Zeyu Chen, Dejing Dou. “G-LIME: Statistical Learning for Local Interpretations of Deep Neural Networks using Global Priors.” Artificial Intelligence, 2023. pdf link.

Qingrui Jia, Xuhong Li, Lei Yu, Penghao Zhao, Jiang Bian, Shupeng Li, Haoyi Xiong, Dejing Dou. “Learning from Training Dynamics: Identifying Mislabeled Data Beyond Manually Designed Features”. AAAI 2023. pdf link.

(2022.08) The paper with this repository is accepted by Journal of Machine Learning Research (JMLR). If this repo is helpful for your work, please consider citing this paper:

Xuhong Li, Haoyi Xiong, Xingjian Li, Xuanyu Wu, Zeyu Chen, and Dejing Dou. “InterpretDL: Explaining Deep Models in PaddlePaddle.” Journal of Machine Learning Research, 2022. https://jmlr.org/papers/v23/21-0738.html.
```
@article{JMLR:v23:21-0738,
author  = {Xuhong Li and Haoyi Xiong and Xingjian Li and Xuanyu Wu and Zeyu Chen and Dejing Dou},
title   = {InterpretDL: Explaining Deep Models in PaddlePaddle},
journal = {Journal of Machine Learning Research},
year    = {2022},
volume  = {23},
number  = {197},
pages   = {1--6},
url     = {http://jmlr.org/papers/v23/21-0738.html}
}
```
(2022.08) Two research works got accepted by ECML and Machine Learning Journal. Cross-model consensus explanations are exploited to create a pseudo semantic segmentation dataset named PSSL, which contains 1.2M pseudo masks corresponding to the whole ImageNet training set. The dataset is publicly available.

Refer to PaddleSeg:PSSL for downloading the dataset and the pretrained models.

Refer to the papers for the details about the Consensus explanation and how it helps for the semantic segmentation task:

Xuhong Li, Haoyi Xiong, Siyu Huang, Shilei Ji, Dejing Dou. Cross-Model Consensus of Explanations and Beyond for Image Classification Models: An Empirical Study. ECML'22, Machine Learning Journal Track. https://arxiv.org/abs/2109.00707.

Xuhong Li, Haoyi Xiong, Yi Liu, Dingfu Zhou, Zeyu Chen, Yaqing Wang, and Dejing Dou. "Distilling ensemble of explanations for weakly-supervised pre-training of image segmentation models." Machine Learning (2022): 1-17. https://arxiv.org/abs/2207.03335.
(2022/04/27) A getting-started tutorial is provided. Check it from GitHub or NBViewer. Usage examples have been provided for each algorithm (both Interpreter and Evaluator). We are currently preparing tutorials for easy usages of InterpretDL. Both tutorials and examples can be assessed under the tutorial folder.
(2022/01/06) Implemented the Cross-Model Consensus Explanation method. In brief, this method averages the explanation results from several models. Instead of interpreting individual models, this method is able to identify the discriminative features in the input data with accurate localization. See the paper for details.
- Consensus: Xuhong Li, Haoyi Xiong, Siyu Huang, Shilei Ji, Dejing Dou. Cross-Model Consensus of Explanations and Beyond for Image Classification Models: An Empirical Study. arXiv:2109.00707.
We show a demo with six models (the last column shows the consensus explanation), while more models (around 15) could give a much better result. See the example for more details.

(2021/10/20) Implemented the Transition Attention Maps (TAM) explanation method for PaddlePaddle Vision Transformers. As always, several lines call this interpreter. See details from the example, and the paper:

TAM: Tingyi Yuan, Xuhong Li, Haoyi Xiong, Hui Cao, Dejing Dou. Explaining Information Flow Inside Vision Transformers Using Markov Chain. In Neurips 2021 XAI4Debugging Workshop.

import paddle
import interpretdl as it

# load vit model and weights
# !wget -c https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams -P assets/
from assets.vision_transformer import ViT_base_patch16_224
paddle_model = ViT_base_patch16_224()
MODEL_PATH = 'assets/ViT_base_patch16_224_pretrained.pdparams'
paddle_model.set_dict(paddle.load(MODEL_PATH))

# Call the interpreter.
tam = it.TAMInterpreter(paddle_model, use_cuda=True)
img_path = 'samples/el1.png'
heatmap = tam.interpret(
      img_path,
      start_layer=4,
      label=None,  # elephant
      visual=True,
      save_path=None)
heatmap = tam.interpret(
      img_path,
      start_layer=4,
      label=340,  # zebra
      visual=True,
      save_path=None)

image	elephant	zebra

(2021/07/22) Implemented Rollout Explanations for PaddlePaddle Vision Transformers. See the notebook for the visualization.

import paddle
import interpretdl as it

# wget -c https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams -P assets/
from assets.vision_transformer import ViT_small_patch16_224
paddle_model = ViT_small_patch16_224()
MODEL_PATH = 'assets/ViT_small_patch16_224_pretrained.pdparams'
paddle_model.set_dict(paddle.load(MODEL_PATH))

img_path = 'assets/catdog.png'
rollout = it.RolloutInterpreter(paddle_model, use_cuda=True)
heatmap = rollout.interpret(img_path, start_layer=0, visual=True)

PaddlePaddle / InterpretDL

readme