URL

https://arxiv.org/pdf/2408.16737
Affiliations
- Hritik Bansal, N/A
- Arian Hosseini, N/A
- Rishabh Agarwal, N/A
- Vinh Q. Tran, N/A
- Mehran Kazemi, N/A
  Abstract
- Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is compute-optimal under a fixed inference budget (e.g., FLOPs). To do so, we investigate the trade-offs between generating synthetic data using a stronger but more expensive (SE) model versus a weaker but cheaper (WC) model. We evaluate the generated data across three key metrics: coverage, diversity, and false positive rate, and show that the data from WC models may have higher coverage and diversity, but also exhibit higher false positive rates. We then finetune LMs on data from SE and WC models in different settings: knowledge distillation, self-improvement, and a novel weak-to-strong improvement setup where a weaker LM teaches reasoning to a stronger LM. Our findings reveal that models finetuned on WC-generated data consistently outperform those trained on SE-generated data across multiple benchmarks and multiple choices of WC and SE models. These results challenge the prevailing practice of relying on SE models for synthetic data generation, suggesting that WC may be the compute-optimal approach for training advanced LM reasoners.
  Translation (by gpt-4o-mini)
高品質な合成データを強力な言語モデル（LM）から生成してトレーニングすることは、LMの推論性能を向上させる一般的な戦略である。本研究では、この戦略が固定された推論予算（例えば、FLOPs）の下で計算最適であるかどうかを再検討する。これを行うために、より強力だが高価な（SE）モデルを使用して合成データを生成することと、より弱いが安価な（WC）モデルを使用することのトレードオフを調査する。生成されたデータを、カバレッジ、多様性、偽陽性率の3つの重要な指標で評価し、WCモデルからのデータはカバレッジと多様性が高い可能性があるが、偽陽性率も高くなることを示す。次に、SEモデルとWCモデルから生成されたデータを用いて、知識蒸留、自己改善、そして弱いLMが強いLMに推論を教える新しい弱から強への改善設定においてLMをファインチューニングする。我々の発見は、WC生成データでファインチューニングされたモデルが、複数のベンチマークおよびWCとSEモデルの複数の選択肢において、SE生成データでトレーニングされたモデルを一貫して上回ることを明らかにする。これらの結果は、合成データ生成のためにSEモデルに依存するという従来の慣行に挑戦し、WCが高度なLM推論者のトレーニングにおいて計算最適なアプローチである可能性を示唆している。
Summary (by gpt-4o-mini)
高品質な合成データを生成するために、強力なSEモデルと安価なWCモデルのトレードオフを再検討。WCモデルからのデータはカバレッジと多様性が高いが偽陽性率も高い。ファインチューニングの結果、WC生成データでトレーニングされたモデルがSE生成データのモデルを上回ることが示され、WCが計算最適なアプローチである可能性を示唆。

AkihikoWatanabe / paper_notes

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, Hritik Bansal+, N/A, arXiv'24 #1427

URL

Affiliations

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)