URL

https://arxiv.org/abs/2303.11366
Affiliations
- Noah Shinn, N/A
- Federico Cassano, N/A
- Edward Berman, N/A
- Ashwin Gopinath, N/A
- Karthik Narasimhan, N/A
- Shunyu Yao, N/A
  Abstract
- Large language models (LLMs) have been increasingly used to interact withexternal environments (e.g., games, compilers, APIs) as goal-driven agents.However, it remains challenging for these language agents to quickly andefficiently learn from trial-and-error as traditional reinforcement learningmethods require extensive training samples and expensive model fine-tuning. Wepropose Reflexion, a novel framework to reinforce language agents not byupdating weights, but instead through linguistic feedback. Concretely,Reflexion agents verbally reflect on task feedback signals, then maintain theirown reflective text in an episodic memory buffer to induce betterdecision-making in subsequent trials. Reflexion is flexible enough toincorporate various types (scalar values or free-form language) and sources(external or internally simulated) of feedback signals, and obtains significantimprovements over a baseline agent across diverse tasks (sequentialdecision-making, coding, language reasoning). For example, Reflexion achieves a91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previousstate-of-the-art GPT-4 that achieves 80%. We also conduct ablation and analysisstudies using different feedback signals, feedback incorporation methods, andagent types, and provide insights into how they affect performance.
  Translation (by gpt-3.5-turbo)
大規模言語モデル（LLMs）は、ゲーム、コンパイラ、APIなどの外部環境との対話において、目標志向のエージェントとして使用されることが増えています。しかし、これらの言語エージェントが試行錯誤から迅速かつ効率的に学習することは依然として困難であり、従来の強化学習方法では大量のトレーニングサンプルと高価なモデルの微調整が必要です。本研究では、重みの更新ではなく言語的フィードバックを通じて言語エージェントを強化するための新しいフレームワークであるReflexionを提案します。具体的には、Reflexionエージェントはタスクのフィードバック信号に対して言語的に反省し、その後の試行でより良い意思決定を促すために、自身の反省的なテキストをエピソードメモリバッファに保持します。 Reflexionは、フィードバック信号の種類（スカラー値または自由形式の言語）やソース（外部または内部シミュレーション）を柔軟に組み込むことができ、さまざまなタスク（順次的な意思決定、コーディング、言語推論）でベースラインエージェントに比べて大幅な改善を実現します。例えば、ReflexionはHumanEvalコーディングベンチマークで91%のパス@1精度を達成し、80%の精度を達成した従来の最先端のGPT-4を上回ります。また、異なるフィードバック信号、フィードバック統合方法、エージェントタイプを使用した削除と分析の研究を行い、それらがパフォーマンスにどのように影響するかについての洞察を提供します。
Summary (by gpt-3.5-turbo)
本研究では、言語エージェントを強化するための新しいフレームワークであるReflexionを提案しています。Reflexionエージェントは、言語的フィードバックを通じて自己反省し、より良い意思決定を促すために反省的なテキストを保持します。Reflexionはさまざまなタスクでベースラインエージェントに比べて大幅な改善を実現し、従来の最先端のGPT-4を上回る精度を達成しました。さらに、異なるフィードバック信号や統合方法、エージェントタイプの研究を行い、パフォーマンスへの影響についての洞察を提供しています。

AkihikoWatanabe / paper_notes

Reflexion: Language Agents with Verbal Reinforcement Learning, Noah Shinn+, N/A, arXiv'23 #512

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)