AkihikoWatanabe commented 1 year ago

URL

https://arxiv.org/abs/2308.06259
Affiliations
- Xian Li, N/A
- Ping Yu, N/A
- Chunting Zhou, N/A
- Timo Schick, N/A
- Luke Zettlemoyer, N/A
- Omer Levy, N/A
- Jason Weston, N/A
- Mike Lewis, N/A
  Abstract
- We present a scalable method to build a high quality instruction followinglanguage model by automatically labelling human-written text with correspondinginstructions. Our approach, named instruction backtranslation, starts with alanguage model finetuned on a small amount of seed data, and a given webcorpus. The seed model is used to construct training examples by generatinginstruction prompts for web documents (self-augmentation), and then selectinghigh quality examples from among these candidates (self-curation). This data isthen used to finetune a stronger model. Finetuning LLaMa on two iterations ofour approach yields a model that outperforms all other LLaMa-based models onthe Alpaca leaderboard not relying on distillation data, demonstrating highlyeffective self-alignment.
  Translation (by gpt-3.5-turbo)
私たちは、人間が書いたテキストに対応する指示を自動的にラベル付けすることで、高品質な指示に従う言語モデルを構築するためのスケーラブルな手法を提案します。私たちの手法である「指示逆翻訳」は、少量のシードデータと与えられたウェブコーパスでファインチューニングされた言語モデルを使用します。シードモデルは、ウェブドキュメントに対して指示のプロンプトを生成することにより、トレーニング例を構築するために使用されます（自己拡張）。そして、これらの候補の中から高品質な例を選択します（自己キュレーション）。このデータを使用して、より強力なモデルをファインチューニングします。私たちの手法の2つのイテレーションでLLaMaをファインチューニングすると、蒸留データに依存しないAlpacaリーダーボード上の他のLLaMaベースのモデルよりも優れた性能を発揮し、非常に効果的な自己整列を実証します。
Summary (by gpt-3.5-turbo)
私たちは、高品質な指示に従う言語モデルを構築するためのスケーラブルな手法を提案します。この手法では、少量のシードデータとウェブコーパスを使用して言語モデルをファインチューニングし、指示のプロンプトを生成してトレーニング例を構築します。そして、高品質な例を選択してモデルを強化します。この手法を使用すると、他のモデルよりも優れた性能を発揮し、自己整列の効果を実証できます。