IsarNejad / TCAV-for-Text-Classifiers

TCAV for NLP, published at ACL2022
MIT License
6 stars 0 forks source link

TCAV for Explaining Text Classifiers

This repository provides the data and code related to the following ACL2022 publication:

Nejadgholi, I. Fraser, K. C., Kiritchenko, S. (2022). Improving Generalizability in Implicitly Abusive Language Detection with Concept Activation Vectors. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.


As described in the paper, we annotated the Hostile class of the dev set of the East-Asian Prejudice (EA) dataset and the Anti-Asian Hate class of the COVID-HATE (CH) dataset for implicit/explicit abuse. Our annotations are available in the Data folder:

CH_Anti_Asian_hate_implicit_indexes.csv and CH_Anti_Asianhate_explicit_indexes.csv include indexes of implicitly and explicitly hateful samples in the Anti-Asian Hate class of the CH dataset, respectively. These indexes correspond to indexes of the annotations.csv file from the original dataset.

EA_dev_hostile_implicit_ids.csv and EA_dev_hostile_explicit_ids.csv include tweet ids of implicitly and explicitly hostile samples of the EA-dev set.


Python modules: Roberta model and functions to compute gradients and logits of a roberta-based classifier fuctions to claculate sensitivities of a trained classifier to a human-defined concept (TCAV scores described in Section 4 of the paper) functions to calcualte the Degree of Explicitness (DoE scores described in Sections 5 and 6 of teh paper)

Example Notebooks:

These notebooks illusterate how to use the above functionalities. In all of the notebooks, the Toxicity classifier refers to a roberta-based binary classifier trained with the Wiki dataset.

TCAV_Example.ipynb: This notebook shows how to calculate the sensitivity of a trained classifier to a human-defined concept (similar to the results in Table 5 of the paper.

DoE_example.ipynb: This colab notebook calcuates the Degree of Explicitness (DoE scores introduced in section 5 of the paper).