Explanations of neural models aim to reveal a model's decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model's inner workings. This work explores the challenging question of evaluating the faithfulness of natural language explanations (NLEs). To this end, we present two tests. First, we propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the NLEs. Second, we reconstruct inputs from the reasons stated in the generated NLEs and check how often they lead to the same predictions. Our tests can evaluate emerging NLE models, proving a fundamental tool in the development of faithful NLEs.

Translation (by gpt-3.5-turbo)

ニューラルモデルの説明は、モデルの予測の意思決定プロセスを明らかにすることを目指しています。しかし、最近の研究では、サリエンシーマップやカウンターファクチュアルなどの説明を提供する現在の方法は、モデルの内部動作に忠実でない理由を提示する傾向があるため、誤解を招く可能性があります。本研究では、自然言語の説明（NLEs）の忠実性を評価するという難しい問題を探求します。このために、2つのテストを提案します。まず、カウンターファクチュアルな予測につながる理由を挿入するためのカウンターファクチュアル入力エディタを提案しますが、これはNLEsに反映されていない理由です。次に、生成されたNLEsに記述された理由から入力を再構築し、どれくらいの頻度で同じ予測につながるかをチェックします。私たちのテストは、新たに登場するNLEモデルを評価することができ、忠実なNLEの開発において基本的なツールとなります。
Summary (by gpt-3.5-turbo)
本研究では、ニューラルモデルの説明の忠実性を評価するための2つのテストを提案しています。1つ目は、カウンターファクチュアルな予測につながる理由を挿入するためのカウンターファクチュアル入力エディタを提案し、2つ目は生成された説明から入力を再構築し、同じ予測につながる頻度をチェックするテストです。これらのテストは、忠実な説明の開発において基本的なツールとなります。

AkihikoWatanabe / paper_notes

Faithfulness Tests for Natural Language Explanations, ACL'23 #850

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)