URL

https://arxiv.org/abs/2203.14465
Affiliations
- Eric Zelikman, N/A
- Yuhuai Wu, N/A
- Jesse Mu, N/A
- Noah D. Goodman, N/A
  Abstract
- Generating step-by-step "chain-of-thought" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning.
  Translation (by gpt-4o-mini)
ステップバイステップの「思考の連鎖」合理的説明を生成することは、数学や常識に基づく質問応答のような複雑な推論タスクにおける言語モデルの性能を向上させる。しかし、言語モデルの合理的説明生成を誘導するには、膨大な合理的説明データセットを構築するか、少数の例による推論を使用することで精度を犠牲にする必要がある。我々は、少数の合理的説明の例と合理的説明のない大規模データセットを反復的に活用し、より複雑な推論を行う能力をブートストラップする技術を提案する。この技術は「自己学習推論者」（STaR）と呼ばれ、シンプルなループに依存している：少数の合理的説明の例を用いて多くの質問に対する合理的説明を生成し、生成された回答が間違っている場合は、正しい回答を与えて合理的説明を再生成し、最終的に正しい回答を導いたすべての合理的説明でファインチューニングを行い、これを繰り返す。実験により、STaRは最終的な回答を直接予測するようにファインチューニングされたモデルと比較して、複数のデータセットで性能を大幅に向上させ、CommensenseQAでは30倍大きな最先端の言語モデルをファインチューニングした場合と同等の性能を示すことがわかった。したがって、STaRはモデルが自ら生成した推論から学ぶことで自己改善を可能にする。
Summary (by gpt-4o-mini)
「自己学習推論者」（STaR）を提案し、少数の合理的説明と大規模データセットを活用して複雑な推論を行う。STaRは、生成した回答が間違っている場合に正しい回答を用いて再生成し、ファインチューニングを繰り返すことで性能を向上させる。実験により、STaRは従来のモデルと比較して大幅な性能向上を示し、特にCommensenseQAでの成果が顕著である。

AkihikoWatanabe / paper_notes

STaR: Bootstrapping Reasoning With Reasoning, Eric Zelikman+, N/A, NeurIPS'22 #1397

URL

Affiliations

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)