A curated list of research papers focused on explainable methods in computer vision, ranging from saliency maps to concept-based explanations and beyond. The goal is to gather key papers that contribute to transparency and interpretability in machine learning models, particularly in the context of visual data.
"Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)"
Authors: Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
Conference: ICML 2018
Summary: Uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result–for example, how sensitive a prediction of zebra is to the presence of stripes.
"Towards Automatic Concept-based Explanations"
Authors: Amirata Ghorbani, James Wexler, James Zou, Been Kim
Conference: NeurIPS 2019
Summary: Automatically discovers concepts by segmenting the image at various resolutions so as to obtain concepts at all hierarchies and clustering similar segments as examples of the same concept. Next use any method like TCAV to explain the relevance of these discovered concepts.
"Concept Bottleneck Models"
Authors: Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang
Conference: PMLR 2020
Summary: First predicts concepts that are provided at training time, and then uses these concepts to predict the label. By construction, one can intervene on these concept bottleneck models by editing their predicted concept values and propagating these changes to the final prediction. No automated way proposed, but provides insights into how test-time concept intervention by domain experts can potentially help correcting incorrect predictions.
"Do Concept Bottleneck Models Learn as Intended?"
Authors: Andrei Margeloiu, Matthew Ashman, Umang Bhatt, Yanzhi Chen, Mateja Jamnik, Adrian Weller
Conference: arXiv 2021
Summary: Demonstrates that concepts learned in CBMs do not correspond to anything semantically meaningful in input space.
"Logic Explained Networks"
Authors: Gabriele Ciravegna, Pietro Barbiero, Francesco Giannini, Marco Gori,
Pietro Liò, Marco Maggini, Stefano Melacci
Journal: Artificial Intelligence
Summary: Logic Explained Networks (LENs) offer interpretable deep learning models that provide human-understandable explanations using First-Order Logic, outperforming traditional white-box models in both supervised and unsupervised learning tasks.
"Entropy-based Logic Explanations of Neural Networks"
Authors: Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Pietro LiĂł, Marco Gori, Stefano Melacci
Conference: AAAI 2022
Summary: A novel end-to-end approach extracts concise First-Order Logic explanations from neural networks using an entropy-based criterion, improving both interpretability and classification accuracy in safety-critical domains.
"GlanceNets: Interpretabile, Leak-proof Concept-based Models"
Authors: Emanuele Marconato, Andrea Passerini, Stefano Teso
Conference: CRL 2022
Summary: Questions the interpretability of the concepts defined in classic CBMs and a propose clear definition of interpretability in terms of
alignment between the model’s representation and an underlying data generation process.
"Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off"
Authors: Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra et al.
Conference: NeurIPS 2022
Summary: propose novel concept-based architectures to overcome the accuracy/interpretability pitfalls of classic CBMs (mostly due to incomplete concepts or over-reliance on concepts), thus enabling their deployment in real-world settings where concept annotations are likely to be incomplete.
"Concept Activation Regions: A Generalized Framework For Concept-Based Explanations"
Authors: Jonathan Crabbé, Mihaela van der Schaar.
Conference: NeurIPS 2022
Summary: Discusses the assumptions behind existing methods like CAV which assume that the examples illustrating a concept are mapped in a fixed direction of the DNN’s latent space. Relaxes this assumption by allowing concept examples to be scattered across different clusters called concept activation region (CAR).
"VICE: Variational Interpretable Concept Embeddings"
Authors: Lukas Muttenthaler, Charles Y. Zheng, Patrick McClure, Robert A. Vandermeulen, Martin N. Hebart, Francisco Pereira.
Conference: NeurIPS 2022
Summary: Method to obtain non-negative representations of object concepts.
"Addressing Leakage in Concept Bottleneck Models"
Authors: Marton Havasi, Sonali Parbhoo, Finale Doshi-Velez.
Conference: NeurIPS 2022
Summary: Improvement over CBMs on fronts of them having an insufficient concept set and an inexpressive concept predictor.
"Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability"
Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Ruth Fong, Olga Russakovsky.
Conference: CVPR 2023
Summary: Talks about the impact of the probe dataset on generated explanations by concept-based explanation methods, and also highlights that the concepts used in the probing datasets are harder to learn that the corresponding class itself. The authors conclude with some suggestions for improving the quality and usability of concept-based explanations.
"CRAFT: Concept Recursive Activation FacTorization for Explainability"
Authors: Thomas Fel, Agustin Picard, Louis Bethune, Thibaut Boissin, David Vigouroux, Julien Colin, Rémi Cadène, Thomas Serre.
Conference: CVPR 2023
Summary: Method to identify both “what” and “where” by generating concept-based explanation.
"Spatial-temporal Concept based Explanation of 3D ConvNets"
Authors: Ying Ji, Yu Wang, Kensaku Mori, Jien Kato.
Conference: CVPR 2023
Summary: 3D ACEs.
"Learning Bottleneck Concepts in Image Classification"
Authors: Bowen Wang, Liangzhi Li, Yuta Nakashima, Hajime Nagahara.
Conference: CVPR 2023
Summary: This paper proposes Bottleneck Concept Learner (BotCL), which represents an image solely by the presence/absence of concepts learned through training over the target task without explicit supervision over the concepts. An image is represented solely by the existence of concepts and is classified using them.
"Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification"
Authors: Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, Mark Yatskar.
Conference: CVPR 2023
Summary: Language Guided Bottlenecks (LaBo), leverages a language model to define a large space of possible bottlenecks. Given a problem domain, LaBo uses GPT-3 to produce factual sentences about categories to form candidate concepts. LLM generated sentential concepts can be aligned to images using CLIP, to form a bottleneck layer.
"Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat"
Authors: Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich
Conference: ICML 2023
Summary: This paper introduces a method to iteratively carve concept-based interpretable models from a Blackbox in a post-hoc manner, using First Order Logic for explanations, while a residual network handles harder cases, achieving high interpretability without sacrificing performance.
"Distilling BlackBox to Interpretable models for Efficient Transfer Learning"
Authors: Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich
Conference: MICCAI 2023
Summary: This paper presents an concept-based interpretable model for chest-X-ray classification that can be efficiently fine-tuned for new domains using minimal labeled data, leveraging semi-supervised learning and distillation from blackbox models.
"Interpretable Neural-Symbolic Concept Reasoning"
Authors: Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Mateo Espinosa Zarlenga et al.
Conference: ICML 2023
Summary: The paper highlights that sota concept-based models rely on high-dimensional concept embedding representations which lack a clear semantic meaning, thus questioning the interpretability of their decision process. They propose a Deep Concept Reasoner (DCR) where neural networks build syntactic rule structures using concept embeddings, but such representations are only used to compute a logic rule. The final prediction is then obtained by evaluating such rules on the concepts’ truth values and not on their embeddings, thus maintaining clear semantics and providing a totally interpretable decision.
"Causal Proxy Models for Concept-based Model Explanations"
Authors: Zhengxuan Wu, Karel D'Oosterlinck, Atticus Geiger, Amir Zur, Christopher Potts.
Conference: ICML 2023
Summary: The paper highlights that explainability methods in NLP systems encounter a form of the fundamental problem of causal inference. This means that for any given text input, we cannot observe the counterfactual versions of that input (i.e., alternate versions of the input that could lead to different outcomes). Without these counterfactuals, isolating the causal influence of specific parts of a model's representation on its outputs is difficult. The core proposal in this paper is the CPM, a model trained to mimic the black-box model. The CPM is designed to produce similar outputs as the original black-box model on actual input texts, while allowing for controlled interventions in its internal representations to simulate counterfactual scenarios.
"Concept-based Explanations for Out-of-Distribution Detectors"
Authors: Jihye Choi, Jayaram Raghuram, Ryan Feng, Jiefeng Chen, Somesh Jha, Atul Prakash.
Conference: ICML 2023
Summary: Propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors.
"Probabilistic Concept Bottleneck Models"
Authors: Eunji Kim, Dahuin Jung, Sangha Park, Siwon Kim, Sungroh Yoon.
Conference: ICML 2023
Summary: ProbCBM models uncertainty in concept prediction and provides explanations based on the concept and its corresponding uncertainty.
"Discover and Cure: Concept-aware Mitigation of Spurious Correlation"
Authors: Shirley Wu, Mert Yuksekgonul, Linjun Zhang, James Zou.
Conference: ICML 2023
Summary: Discovers unstable concepts across different environments as spurious attributes, and then intervenes on the training data using the discovered concepts to reduce spurious correlation.
"A Closer Look at the Intervention Procedure of Concept Bottleneck Models"
Authors: Sungbin Shin, Yohan Jo, Sungsoo Ahn, Namhoon Lee.
Conference: ICML 2023
Summary: Develops various ways of selecting intervening concepts to improve the intervention effectiveness and conduct an array of in-depth analyses as to how they evolve under different circumstances.
"Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis"
Authors: Han Xuanyuan, Pietro Barbiero, Dobrik Georgiev, Lucie Charlotte Magister, Pietro LiĂł.
Conference: AAAI 2023
Summary: highlights a finding that Graph Neural Network (GNN) neurons act as concept detectors, meaning that individual neurons in a GNN are capable of recognizing specific patterns or "concepts" within graph data. These concepts are tied to properties such as node degree (the number of edges connected to a node) and the neighborhood properties of the node (attributes of nodes directly connected to it.
"Interactive Concept Bottleneck Models"
Authors: Kushal Chauhan, Rishabh Tiwari, Jan Freyberg, Pradeep Shenoy, Krishnamurthy Dvijotham.
Conference: AAAI 2023
Summary: Extends CBMs to interactive prediction settings where the model can query a human collaborator for the label to some concepts. We develop an interaction policy that, at prediction time, chooses which concepts to request a label for so as to maximally improve the final prediction.
"Sparse Linear Concept Discovery Models"
Authors: Konstantinos P. Panousis, Dino Ienco, Diego Marcos.
Conference: arXiv 2023
Summary: Highlights that CBMs usually suffer from performance degradation and lower interpretability than intended due to the sheer amount of concepts contributing to each decision. The authors propose a simple yet highly intuitive interpretable framework based on Contrastive Language Image models and a single sparse linear layer meaning that only a few features (or concepts) are allowed to contribute to each decision, addressing the problem of too many concepts being involved.
"Statistically Signifcant Concept-based Explanation of Image Classifers via Model Knockoffs"
Authors: Kaiwen Xu1, Kazuto Fukuchi1, Youhei Akimoto1, Jun Sakuma.
Conference: IJCAI 2023
Summary: Propose a method to learn the image concept and then using the Knockoff samples to select the important concepts for prediction by controlling the False Discovery Rate (FDR) under a certain value.
"Text2Concept: Concept Activation Vectors Directly from Text"
Authors: Mazda Moayeri, Keivan Rezaei, Maziar Sanjabi, Soheil Feizi.
Conference: CVPRW 2023
Summary: Text2Concept introduces a method to generate CAVs directly from text descriptions, instead of relying on curated examples. The method leverages the CLIP model (which connects text and image representations in a shared multi-modal feature space) to enable any off-the-shelf vision model (like ResNet, etc.) to use text-based concepts without extensive training. The key innovation is a linear mapping layer that aligns the feature space of the vision model with the feature space of CLIP. This mapping layer requires only minimal training on existing data to achieve this alignment.
"Label-Free Concept Bottleneck Models"
Authors: Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, Tsui-Wei Weng.
Conference: ICLR 2023
Summary: Method to transform any neural network into an interpretable CBM without labeled concept data, while retaining a high accuracy.
"CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks"
Authors: Tuomas Oikarinen, Tsui-Wei Weng.
Conference: ICLR 2023
Summary: Leverages multimodal vision/language models to label internal neurons with open-ended concepts without the need for any labeled data or human examples.
"Concept-level Debugging of Part-Prototype Networks"
Authors: _AndreaBontempelli, Stefano Teso, Katya Tentori, Fausto Giunchiglia, Andrea Passerini.
Conference: ICLR 2023
Summary: Proposes a "debugger", which is a method for human experts to provide feedback on model predictions, specifically on what portion of the input is relevant, which is then further used to finetune the model.
"Post-hoc Concept Bottleneck Models"
Authors: Mert Yuksekgonul, Maggie Wang, James Zou.
Conference: ICLR 2023
Summary: PCBMs can convert any pre-trained model into a concept bottleneck model in a data-efficient manner, and enhance the model with the desired interpretability benefits. In constrast to CBMs tackling local interventions, PCBMs propose interventions for changing global model behavior.
"Explain Any Concept: Segment Anything Meets Concept-Based Explanation"
Authors: Ao Sun, Pingchuan Ma, Yuanyuan Yuan, Shuai Wang.
Conference: NeurIPS 2023
Summary: Explores using SAM as a concept discovery method to augment conceptbased XAI. Concepts here are largely class-level.
"A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation"
Authors: Thomas Fel, Victor Boutin, Mazda Moayeri, Rémi Cadène, Louis Bethune, Léo andéol, Mathieu Chalvidal, Thomas Serre.
Conference: NeurIPS 2023
Summary: The authors propose a unified perspective on post-hoc concept-based explanation methods. The main intuition underlying the work revolves around the fact that typical concept-based explanations can be considered as a two-stage process whereby, initially, a concept vocabulary is learned and, lastly, each concept's importance is evaluated.
"Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision"
Authors: Wonjoon Chang, Dahee Kwon, Jaesik Choi.
Conference: AAAI 2024
Summary: Propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons.
"Unsupervised Concept Discovery Mitigates Spurious Correlations"
Authors: Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi.
Conference: ICML 2024
Summary: Introduces CoBalT, a method combining concept discovery with concept balancing for robust classification. he goal of CoBalT is to improve classification accuracy, particularly in situations where certain concepts may be over- or under-represented in the data. This approach addresses the challenge of imbalance in concept-based learning and ensures that the model doesn't overly rely on dominant or spurious concepts during classification. CoBalT follows a two-stage procedure common in the literature: first, inferring information about the training data, and then leveraging this information for robust training.
"Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation"
Authors: Floris Holstege, Bram Wouters, Noud Van Giersbergen, Cees Diks.
Conference: ICML 2024
Summary: The algorithm works by jointly estimating two low-dimensional subspaces within the high-dimensional neural network representation. These subspaces are designed to be orthogonal, meaning they are independent of each other and capture distinct features of the data. One subspace represents the main-task concepts, which are the relevant features the model needs to solve the primary classification or prediction task. The other subspace represents spurious concepts, which are features that the model may learn but are not directly relevant (and often harmful) to the main task, such as background patterns or dataset biases.
"Understanding Inter-Concept Relationships in Concept-Based Models"
Authors: Naveen Janaki Raman, Mateo Espinosa Zarlenga, Mateja Jamnik.
Conference: ICML 2024
Summary: The algorithm works by jointly estimating two low-dimensional subspaces within the high-dimensional neural network representation. These subspaces are designed to be orthogonal, meaning they are independent of each other and capture distinct features of the data. One subspace represents the main-task concepts, which are the relevant features the model needs to solve the primary classification or prediction task. The other subspace represents spurious concepts, which are features that the model may learn but are not directly relevant (and often harmful) to the main task, such as background patterns or dataset biases.
"Towards Compositionality in Concept Learning"
Authors: Adam Stein, Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong.
Conference: ICML 2024
Summary: Method to extract concepts that are compositional.
"Learning to Intervene on Concept Bottlenecks"
Authors: David Steinmann, Wolfgang Stammer, Felix Friedrich, Kristian Kersting.
Conference: ICML 2024
Summary: CB2M allows the reuse of information provided in previous interventions by keeping a memory of past interventions.
"Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models"
Authors: Hengyi Wang, Shiwei Tan, Hao Wang.
Conference: ICML 2024
Summary: This paper proposes five desiderata for explaining ViTs – faithfulness, stability, sparsity, multi-level structure, and parsimony – and demonstrates the inadequacy of current methods in meeting these criteria comprehensively.
"TabCBM: Concept-based Interpretable Neural Networks for Tabular Data"
Authors: Mateo Espinosa Zarlenga, Zohreh Shams, Michael Edward Nelson, Been Kim, Google Deepmind, Mateja Jamnik
Conference: TMLR 2024
Summary: Propose Tabular Concept Bottleneck Models (TabCBMs), a family of interpretable self-explaining neural architectures capable of learning high-level concept explanations for tabular tasks.
"Understanding Video Transformers via Universal Concept Discovery"
Authors: Matthew Kowal, Achal Dave, Rares Ambrus, Adrien Gaidon, Konstantinos G. Derpanis, Pavel Tokmakov
Conference: CVPR 2024
Summary: Discusses that concept-based interpretability has concentrated solely on image-level tasks, along with introducing a method to discover concepts in video transformers .
"Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models"
Authors: Matthew Kowal, Richard P. Wildes, Konstantinos G. Derpanis
Conference: CVPR 2024
"Learning Deep Features for Discriminative Localization"
Authors: Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
Conference: CVPR 2015
"LRP: Layer-wise relevance propagation for neural networks with local renormalization layers"
Authors: Alexander Binder, Grégoire Montavon, Sebastian Bach, Klaus-Robert Müller, Wojciech Samek
Conference: ICANN 2016
"Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization"
Authors: Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
Conference: ICCV 2017
"Shap: A Unified Approach to Interpreting Model Predictions"
Authors: Scott Lundberg, Su-In Lee
Conference: NeurIPS 2017
"RISE: Randomized Input Sampling for Explanation of Black-box Model"
Authors: Vitali Petsiuk, Abir Das, Kate Saenko
Conference: BMVC 2018
"LIME: "Why Should I Trust You?": Explaining the Predictions of Any Classifier"
Authors: Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
Conference: SIGKDD 2018
"Interpretable Explanations of Black Boxes by Meaningful Perturbation"
Authors: Ruth C. Fong, Andrea Vedaldi
Conference: ICCV 2017
"XAI-Bench"
Description: Synthetic datasets along with a library for benchmarking feature attribution algorithms.
Link
"CUB-200-2011 Birds Dataset"
Description: A popular dataset for fine-grained image classification with concept annotations used for explainability research.
Link
"XIMAGENET-12: An Explainable Visual Benchmark Dataset for Robustness Evaluation"
Description: Studies and provides information on class dependent vs independent factors across images (e.g. colors, blur, edges etc).
Link
"Gradient based Feature Attribution in Explainable AI: A Technical Review"
Authors: Yongjie Wang, Tong Zhang, Xu Guo, Zhiqi Shen
Conference: arXiv 2024
"A Novel Survey on Image Classification Models for Explainable Predictions using Computer Vision"
Authors: Lakkuru Venkata Koushik, Atla Vardhan Reddy, KarnatiSai Rithvik, Balguri Manish Rao, Hari Kiran Vege, Dinesh Kumar Anguraj
Conference: ICAAIC 2024
"Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence"
Authors: Vikas Hassija, Vinay Chamola et al.
Conference: arXiv 2024
"Survey on Explainable AI: Techniques, challenges and open issues"
Authors: Adel Abusitta et al.
Conference: arXiv 2024
Feel free to contribute! If you have found any important paper that you think belongs here, create a pull request or open an issue.