[Summary] Explanation Operations

nfelnlp commented 1 year ago

Operation	Terminals / Prompts	Action	Description	Tools	Status
nlpattribute	nlpattribute token \| phrase \| sentence {classes}	feature_importance	Provides feature importances at the token (default), phrase or sentence level.	Captum (Integrated Gradients)	✅
globaltopk	important {number} {classes}	global_topk	Returns top k most attributed tokens across the entire dataset.	Captum (Integrated Gradients)	✅
nlpcfe	nlpcfe {number}	counterfactuals	Returns counterfactual explanations (model predicts another label) for a single instance.	Polyjuice	✅
adversarial	adversarial {number}		Returns adversarial examples (model predicts wrong label) for a single instance.	OpenAttack	✅
similar	similar {number}	similarity	Gets number of training data instances that are most similar to the current one.	Sentence Transformers	✅
rules	rules {number}		Outputs the decision rules for the dataset.	Anchors	⛔
interact	interact		Gets feature interactions.	HEDGE	⛔
rationalize	rationalize	rationalize	Explains the prediction for some specified instance in natural language.	Zero-shot prompting with GPTNeo parser	✅

nfelnlp commented 1 year ago

adversarial (via OpenAttack) has more than twice the execution time than Polyjuice which already takes quite a while. Since CFEs already cover a similar operation, OpenAttack is not part of the roadmap anymore. The long-term plan is to train one multi-purpose model than can reasonably perturb text for generating adversarial attacks, counterfactuals and general data augmentation at once.

interact (via HEDGE) is not possible to implement, because hierarchical explanations don't have an obvious natural language representation. Visualizations are not part of the agenda as of now.

rules (via Anchors) does not appear to return rules that inherently make sense (mostly single tokens) and takes very long to compute.

rationalize (via OpenAI API or a rationalizing LLM) will be implemented soon.

nfelnlp commented 1 year ago

For rationalize, we can do the following:

Design one prompt for each dataset
Insert the input texts into the prompts
Use GPT-3.5 / -4 to generate a few hundred rationales in a zero-shot setup
Fine-tune a T5 for each dataset of rationales
Run inference with the fine-tuned T5 to produce rationales for the rest of the datasets (because using ChatGPT for the tens of thousands of examples in BoolQ, OLID & DD is too expensive)
Store generated rationales as CSVs or JSONs (see pre-computed feature attribution explanations in the cache folder for reference)

DFKI-NLP / InterroLang

[Summary] Explanation Operations #13