URL

https://arxiv.org/abs/2203.02155
Affiliations
- Long Ouyang, N/A
- Jeff Wu, N/A
- Xu Jiang, N/A
- Diogo Almeida, N/A
- Carroll L. Wainwright, N/A
- Pamela Mishkin, N/A
- Chong Zhang, N/A
- Sandhini Agarwal, N/A
- Katarina Slama, N/A
- Alex Ray, N/A
- John Schulman, N/A
- Jacob Hilton, N/A
- Fraser Kelton, N/A
- Luke Miller, N/A
- Maddie Simens, N/A
- Amanda Askell, N/A
- Peter Welinder, N/A
- Paul Christiano, N/A
- Jan Leike, N/A
- Ryan Lowe, N/A
  Abstract
- Making language models bigger does not inherently make them better atfollowing a user's intent. For example, large language models can generateoutputs that are untruthful, toxic, or simply not helpful to the user. In otherwords, these models are not aligned with their users. In this paper, we show anavenue for aligning language models with user intent on a wide range of tasksby fine-tuning with human feedback. Starting with a set of labeler-writtenprompts and prompts submitted through the OpenAI API, we collect a dataset oflabeler demonstrations of the desired model behavior, which we use to fine-tuneGPT-3 using supervised learning. We then collect a dataset of rankings of modeloutputs, which we use to further fine-tune this supervised model usingreinforcement learning from human feedback. We call the resulting modelsInstructGPT. In human evaluations on our prompt distribution, outputs from the1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3,despite having 100x fewer parameters. Moreover, InstructGPT models showimprovements in truthfulness and reductions in toxic output generation whilehaving minimal performance regressions on public NLP datasets. Even thoughInstructGPT still makes simple mistakes, our results show that fine-tuning withhuman feedback is a promising direction for aligning language models with humanintent.
  Translation (by gpt-3.5-turbo)
言語モデルを大きくすること自体が、ユーザーの意図に従う能力を必ずしも向上させるわけではありません。例えば、大規模な言語モデルは、真実でない、有害な、または単にユーザーにとって役立たない出力を生成することがあります。言い換えれば、これらのモデルはユーザーと一致していないのです。本論文では、人間のフィードバックを用いて言語モデルをユーザーの意図に合わせる手法を幅広いタスクに適用する方法を示します。ラベラーが作成したプロンプトとOpenAI APIを介して提出されたプロンプトを出発点とし、望ましいモデルの振る舞いのラベラーによるデモンストレーションのデータセットを収集し、これを用いてGPT-3を教師あり学習で微調整します。その後、モデルの出力のランキングのデータセットを収集し、この教師ありモデルをさらに人間のフィードバックによる強化学習で微調整します。このようにして得られたモデルをInstructGPTと呼びます。私たちのプロンプト分布における人間の評価では、13億パラメータのInstructGPTモデルの出力が175BのGPT-3の出力よりも好まれましたが、パラメータ数は100倍少ないです。さらに、InstructGPTモデルは真実性の向上と有害な出力の削減を示し、一方で一般的なNLPデータセットにおける性能の低下は最小限でした。InstructGPTはまだ単純なミスを com していますが、私たちの結果は、人間のフィードバックを用いた微調整が言語モデルを人間の意図に合わせる有望な方向であることを示しています。
Summary (by gpt-3.5-turbo)
大規模な言語モデルは、ユーザーの意図に合わない出力を生成することがあります。本研究では、人間のフィードバックを使用してGPT-3を微調整し、InstructGPTと呼ばれるモデルを提案します。この手法により、13億パラメータのInstructGPTモデルの出力が175BのGPT-3の出力よりも好まれ、真実性の向上と有害な出力の削減が示されました。さらに、一般的なNLPデータセットにおける性能の低下は最小限でした。InstructGPTはまだ改善の余地がありますが、人間のフィードバックを使用した微調整が有望な方向であることを示しています。

AkihikoWatanabe / paper_notes

Training language models to follow instructions with human feedback, Long Ouyang+, N/A, NeurIPS'22 #1296

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)