AkihikoWatanabe commented 1 year ago

URL

http://arxiv.org/abs/2308.03958
Affiliations
- Jerry Wei, N/A
- Da Huang, N/A
- Yifeng Lu, N/A
- Denny Zhou, N/A
- Quoc V. Le, N/A
  Abstract
- Sycophancy is an undesirable behavior where models tailor their responses tofollow a human user's view even when that view is not objectively correct(e.g., adapting liberal views once a user reveals that they are liberal). Inthis paper, we study the prevalence of sycophancy in language models andpropose a simple synthetic-data intervention to reduce this behavior. First, on a set of three sycophancy tasks (Perez et al., 2022) where modelsare asked for an opinion on statements with no correct answers (e.g.,politics), we observe that both model scaling and instruction tuningsignificantly increase sycophancy for PaLM models up to 540B parameters.Second, we extend sycophancy evaluations to simple addition statements that areobjectively incorrect, finding that despite knowing that these statements arewrong, language models will still agree with them if the user does as well. To reduce sycophancy, we present a straightforward synthetic-dataintervention that takes public NLP tasks and encourages models to be robust touser opinions on these tasks. Adding these data in a lightweight finetuningstep can significantly reduce sycophantic behavior on held-out prompts. Codefor generating synthetic data for intervention can be found athttps://github.com/google/sycophancy-intervention.
  Translation (by gpt-3.5-turbo)
機械学習モデルが、客観的に正しくない場合でも、ユーザーの意見に合わせて応答を調整することは望ましくない行動である。本論文では、言語モデルにおけるおべっか行動の普及度を調査し、この行動を減らすための簡単な合成データ介入を提案する。まず、おべっかタスク（Perez et al., 2022）の3つのセットにおいて、モデルのスケーリングとインストラクションの調整が、PaLMモデルのおべっか行動を540Bパラメータまで増加させることを観察した。次に、客観的に間違っている単純な足し算の文に対するおべっか評価を拡張し、ユーザーがそれに同意する場合でも、これらの文が間違っていることを知っているにもかかわらず、言語モデルがそれに同意することを発見した。おべっか行動を減らすために、公開されているNLPタスクを使用して、ユーザーの意見に対してモデルが頑健であることを促す合成データ介入を提案する。これらのデータを軽量なファインチューニングステップに追加することで、保持されたプロンプトにおけるおべっか行動を大幅に減らすことができる。介入のための合成データの生成コードは、https://github.com/google/sycophancy-intervention で入手できる。
Summary (by gpt-3.5-turbo)
本研究では、機械学習モデルのおべっか行動を減らすための方法を提案しています。まず、言語モデルにおけるおべっか行動の普及度を調査し、その行動を減らすための合成データ介入を提案しています。具体的には、ユーザーの意見に対してモデルが頑健であることを促す合成データを使用し、モデルのファインチューニングを行います。これにより、おべっか行動を大幅に減らすことができます。提案手法の詳細は、https://github.com/google/sycophancy-intervention で確認できます。

AkihikoWatanabe commented 1 year ago

LLMはユーザの好む回答をするように事前学習されるため、prompt中にユーザの意見が含まれていると、ユーザの意見に引っ張られ仮に不正解でもユーザの好む回答をしてしまう問題があることを示した。また、その対策として人工的にユーザの意見と、claimを独立させるように学習するためのデータセットを生成しFinetuningすることで防ぐことができることを示した。

AkihikoWatanabe commented 1 year ago

誤ったユーザの意見を挿入すると、正解できていた問題でも不正解になることを示した。

AkihikoWatanabe commented 1 year ago

この傾向は、instruction tuningしている場合、モデルサイズが大きい場合により顕著であることを示した。

AkihikoWatanabe commented 1 year ago

AkihikoWatanabe / paper_notes

Simple synthetic data reduces sycophancy in large language models, Jerry Wei+, N/A, arXiv'23 #1038

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)