DFKI-NLP / InterroLang

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations [EMNLP 2023 Findings]
https://arxiv.org/abs/2310.05592
5 stars 1 forks source link

[Summary] Explanation Operations #13

Open nfelnlp opened 1 year ago

nfelnlp commented 1 year ago
Operation Terminals / Prompts Action Description Tools Status
nlpattribute nlpattribute token | phrase | sentence {classes} feature_importance Provides feature importances at the token (default), phrase or sentence level. Captum (Integrated Gradients)
globaltopk important {number} {classes} global_topk Returns top k most attributed tokens across the entire dataset. Captum (Integrated Gradients)
nlpcfe nlpcfe {number} counterfactuals Returns counterfactual explanations (model predicts another label) for a single instance. Polyjuice
adversarial adversarial {number} Returns adversarial examples (model predicts wrong label) for a single instance. OpenAttack
similar similar {number} similarity Gets number of training data instances that are most similar to the current one. Sentence Transformers
rules rules {number} Outputs the decision rules for the dataset. Anchors
interact interact Gets feature interactions. HEDGE
rationalize rationalize rationalize Explains the prediction for some specified instance in natural language. Zero-shot prompting with GPTNeo parser
nfelnlp commented 1 year ago

adversarial (via OpenAttack) has more than twice the execution time than Polyjuice which already takes quite a while. Since CFEs already cover a similar operation, OpenAttack is not part of the roadmap anymore. The long-term plan is to train one multi-purpose model than can reasonably perturb text for generating adversarial attacks, counterfactuals and general data augmentation at once.

interact (via HEDGE) is not possible to implement, because hierarchical explanations don't have an obvious natural language representation. Visualizations are not part of the agenda as of now.

rules (via Anchors) does not appear to return rules that inherently make sense (mostly single tokens) and takes very long to compute.

rationalize (via OpenAI API or a rationalizing LLM) will be implemented soon.

nfelnlp commented 1 year ago

For rationalize, we can do the following:

  1. Design one prompt for each dataset
  2. Insert the input texts into the prompts
  3. Use GPT-3.5 / -4 to generate a few hundred rationales in a zero-shot setup
  4. Fine-tune a T5 for each dataset of rationales
  5. Run inference with the fine-tuned T5 to produce rationales for the rest of the datasets (because using ChatGPT for the tens of thousands of examples in BoolQ, OLID & DD is too expensive)
  6. Store generated rationales as CSVs or JSONs (see pre-computed feature attribution explanations in the cache folder for reference)