URL

https://arxiv.org/abs/2310.11716v1
Affiliations
- Ming Li, N/A
- Lichang Chen, N/A
- Jiuhai Chen, N/A
- Shwai He, N/A
- Heng Huang, N/A
- Jiuxiang Gu, N/A
- Tianyi Zhou, N/A
  Abstract
- Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation. Notably, the output control and alignment with the input of LLMs can be refined through instruction tuning. However, as highlighted in several studies, low-quality data in the training set are usually detrimental to instruction tuning, resulting in inconsistent or even misleading LLM outputs. We propose a novel method, termed "reflection-tuning," which addresses the problem by self-improvement and judging capabilities of LLMs. This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data. Extensive experiments on widely used evaluation benchmarks show that LLMs trained with our recycled data outperform those trained with existing datasets in various benchmarks.
  Translation (by gpt-4o-mini)
最近の大規模言語モデル（LLMs）の進展は、自然言語理解と生成の可能性を広げている。特に、LLMsの出力制御と入力との整合性は、指示チューニングを通じて洗練されることができる。しかし、いくつかの研究で指摘されているように、トレーニングセットにおける低品質なデータは通常、指示チューニングに悪影響を及ぼし、一貫性のない、あるいは誤解を招くLLMの出力を引き起こす。そこで本研究では、「リフレクションチューニング」と呼ばれる新しい手法を提案し、LLMsの自己改善と判断能力を活用してこの問題に対処する。このアプローチは、オラクルLLMを利用して元のトレーニングデータを再利用し、データ内の指示と応答の質を内省し向上させる。広く使用されている評価ベンチマークにおける徹底的な実験により、我々の再利用データで訓練されたLLMsが、既存のデータセットで訓練されたモデルをさまざまなベンチマークで上回ることが示された。
Summary (by gpt-4o-mini)
リフレクションチューニングという新手法を提案し、LLMsの自己改善を通じて低品質なトレーニングデータの問題に対処。オラクルLLMを用いてデータの質を向上させ、実験により再利用データで訓練されたLLMsが既存モデルを上回ることを示した。

AkihikoWatanabe / paper_notes

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning, Ming Li+, N/A, arXiv'23 #1380

URL

Affiliations

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)