Large language models (LLMs) have been increasingly used to interact withexternal environments (e.g., games, compilers, APIs) as goal-driven agents.However, it remains challenging for these language agents to quickly andefficiently learn from trial-and-error as traditional reinforcement learningmethods require extensive training samples and expensive model fine-tuning. Wepropose Reflexion, a novel framework to reinforce language agents not byupdating weights, but instead through linguistic feedback. Concretely,Reflexion agents verbally reflect on task feedback signals, then maintain theirown reflective text in an episodic memory buffer to induce betterdecision-making in subsequent trials. Reflexion is flexible enough toincorporate various types (scalar values or free-form language) and sources(external or internally simulated) of feedback signals, and obtains significantimprovements over a baseline agent across diverse tasks (sequentialdecision-making, coding, language reasoning). For example, Reflexion achieves a91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previousstate-of-the-art GPT-4 that achieves 80%. We also conduct ablation and analysisstudies using different feedback signals, feedback incorporation methods, andagent types, and provide insights into how they affect performance.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)