PaddlePaddle / InterpretDL

InterpretDL: Interpretation of Deep Learning Models,基于『飞桨』的模型可解释性算法库。
https://interpretdl.readthedocs.io
Apache License 2.0
237 stars 38 forks source link
convolutional-neural-networks explanations grad-cam interpretation-algorithms lime model-interpretation nlp-models paddlepaddle smoothgrad vision-transformer visualizations

中文 | English

Release PyPI CircleCI Documentation Status Downloads

logo

InterpretDL: Interpretation of Deep Learning Models based on PaddlePaddle

InterpretDL, short for interpretations of deep learning models, is a model interpretation toolkit for PaddlePaddle models. This toolkit contains implementations of many interpretation algorithms, including LIME, Grad-CAM, Integrated Gradients and more. Some SOTA and new interpretation algorithms are also implemented.

InterpretDL is under active construction and all contributions are welcome!

Why InterpretDL

The increasingly complicated deep learning models make it impossible for people to understand their internal workings. Interpretability of black-box models has become the research focus of many talented researchers. InterpretDL provides a collection of both classical and new algorithms for interpreting models.

By utilizing these helpful methods, people can better understand why models work and why they don't, thus contributing to the model development process.

For researchers working on designing new interpretation algorithms, InterpretDL gives an easy access to existing methods that they can compare their work with.

:fire: :fire: :fire: News :fire: :fire: :fire:

Xuhong Li, Jiamin Chen, Yekun Chai, Haoyi Xiong. GILOT: Interpreting Generative Language Models via Optimal Transport. ICML 2024. paper link.

Xuhong Li, Mengnan Du, Jiamin Chen, Yekun Chai, Himabindu Lakkaraju, Haoyi Xiong. “M4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities and Models.” Neurips 2023, Dataset and Benchmark Track. paper link.

Jiamin Chen, Xuhong Li, Lei Yu, Dejing Dou, Haoyi Xiong. “Beyond Intuition: Rethinking Token Attributions inside Transformers.” TMLR. paper link.

Demo

Interpretation algorithms give a hint of why a black-box model makes its decision.

The following table gives visualizations of several interpretation algorithms applied to the original image to tell us why the model predicts "bull_mastiff."

Original Image IntGrad (demo) SG (demo) LIME (demo) Grad-CAM (demo)

For sentiment analysis task, the reason why a model gives positive/negative predictions can be visualized as follows. A quick demo can be found here. Samples in Chinese are also available here.

Contents

Installation

It requires the deep learning framework paddlepaddle, versions with CUDA support are recommended.

Pip installation

pip install interpretdl

# or with tsinghua mirror
pip install interpretdl -i https://pypi.tuna.tsinghua.edu.cn/simple

Developer installation

git clone https://github.com/PaddlePaddle/InterpretDL.git
# ... fix bugs or add new features
cd InterpretDL && pip install -e .
# welcome to propose pull request and contribute
yapf -i <python_file_path>  # code style: column_limit=120

Unit Tests

# run gradcam unit tests
python -m unittest -v tests.interpreter.test_gradcam
# run all unit tests
python -m unittest -v

Documentation

Online link: interpretdl.readthedocs.io.

Or generate the docs locally:

git clone https://github.com/PaddlePaddle/InterpretDL.git
cd docs
make html
open _build/html/index.html

Getting Started

All interpreters inherit the abstract class Interpreter, of which interpret(**kwargs) is the function to call.

# an example of SmoothGradient Interpreter.

import interpretdl as it
from paddle.vision.models import resnet50
paddle_model = resnet50(pretrained=True)
sg = it.SmoothGradInterpreter(paddle_model, use_cuda=True)
gradients = sg.interpret("test.jpg", visual=True, save_path=None)

A quick Getting-Started tutorial (or on NBviewer) is provided. It takes only a few minutes to be familiar with InterpretDL.

Examples and Tutorials

We have provided at least one example for each interpretation algorithm and each trustworthiness evaluation algorithm, hopefully covering applications for both CV and NLP.

We are currently preparing tutorials for easy usages of InterpretDL.

Both examples and tutorials can be accessed under tutorials folder.

Roadmap

We are planning to create a useful toolkit for offering the model interpretations as well as evaluations. We have now implemented the interpretation algorithms as follows, and we are planning to add more algorithms that are desired. Welcome to contribute or just tell us which algorithms are desired.

Implemented Algorithms with Taxonomy

Two dimensions (representations of explanation results and types of the target model) are used to categorize the interpretation algorithms. This taxonomy can be an indicator to find the best suitable algorithm for the target task and model.

Methods Representation Model Type
LIME Input Features Model-Agnostic
LIME with Prior Input Features Model-Agnostic
GLIME Input Features Model-Agnostic
NormLIME/FastNormLIME Input Features Model-Agnostic
LRP Input Features Differentiable*
SmoothGrad Input Features Differentiable
IntGrad Input Features Differentiable
GradSHAP Input Features Differentiable
Occlusion Input Features Model-Agnostic
GradCAM/CAM Intermediate Features Specific: CNNs
ScoreCAM Intermediate Features Specific: CNNs
Rollout Intermediate Features Specific: Transformers
TAM Input Features Specific: Transformers
Generic Attention Input Features Specific: Transformers
Bidirectional Input Features Specific: Transformers
ForgettingEvents Dataset-Level Differentiable
TIDY (Training Data Analyzer) Dataset-Level Differentiable
BHDF Dataset-Level** Differentiable
Consensus Features Cross-Model

* LRP requires that the model is of specific implementations for relevance back-propagation.

** Dataset-Level Interpreters require a training process.

Implemented Trustworthiness Evaluation Algorithms

Interpretability Evaluation Algorithms

Planning Alorithms

Presentations

Linux Foundation Project AI & Data -- Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond. Video Link (00:20:30 -- 00:45:00).

Baidu Create 2021 (in Chinese): Video Link (01:18:40 -- 01:36:30).

ICML 2021 Expo -- Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond. Video Link.

References of Algorithms

Copyright and License

InterpretDL is provided under the Apache-2.0 license.

Recent News

Xuhong Li, Haoyi Xiong, Xingjian Li, Xiao Zhang, Ji Liu, Haiyan Jiang, Zeyu Chen, Dejing Dou. “G-LIME: Statistical Learning for Local Interpretations of Deep Neural Networks using Global Priors.” Artificial Intelligence, 2023. pdf link.

Qingrui Jia, Xuhong Li, Lei Yu, Penghao Zhao, Jiang Bian, Shupeng Li, Haoyi Xiong, Dejing Dou. “Learning from Training Dynamics: Identifying Mislabeled Data Beyond Manually Designed Features”. AAAI 2023. pdf link.