ManifoldRG / Manifold-KB

This repository serves as a knowledge base with key insights, details from other research and implementations to serve as references and one place to document various possible paths to achieve something.
GNU General Public License v3.0
4 stars 0 forks source link

AF Survey - Reflexion: Language Agents with Verbal Reinforcement Learning #23

Closed pranavguru closed 9 months ago

pranavguru commented 9 months ago

Paper title: Reflexion: Language Agents with Verbal Reinforcement Learning (link to paper) Estimated time to complete the review: by 09/22/23 If you are new to Manifold, here are some helpful links:

pranavguru commented 9 months ago

Reflexion: Language Agents with verbal Reinforcement Learning

Review author: Pranav Guruprasad

Summary:

The authors of this paper propose Reflexion - a framework for language-based agents, that converts binary or scalar feedback from the environment that the agent interacts with, into verbal feedback in the form of a textual summary. This framework aims to reinforce the language agents without updating weights. The agents that are augmented with the Reflexion framework, reflect on their task’s feedback signals and maintain a verbal feedback version of the signals in an episodic memory buffer, which is then used to improve decision making in subsequent trials.

The authors develop a modular framework for Reflexion, consisting of 3 distinct modules:

Another core component of the Reflexion process is memory, consisting of short-term and long-term memory. During inference, the actor conditions its decisions on these memory components. The two memory components work together to provide context that is influenced by lessons learned over several trials.

The Reflexion process:

The above 3 steps are carried out iteratively until the evaluator deems the trajectory generated by the actor to be correct.

Authors show that Reflexion agents improve on decision-making AlfWorld tasks over the baseline by 22%, on reasoning questions in HotPotQA by 20%, and Python programming tasks on HumanEval by up to 11%, thus proving that Reflexion achieves improvements over strong baselines across various tasks, and SOTA in a few of these tasks. They also introduce LeetcodeHardGym, a code generation RL gym environment consisting of 40 Leetcode ‘hard’ questions in 19 programming languages.

Motivation:

Experiments and Results:

The authors evaluate various natural language RL agents on decision-making, reasoning, and code generation tasks. Specifically search-based question answering on HotPotQA, for which Reflexion shows an improvement of 20% over the baseline; multi-step tasks in household environments in AlfWorld, for which Reflexion shows an improvement of 22% over the baseline; and code writing tasks in competition-like environments with interpreters and compilers in HumanEval, for which Reflexion shows a 11% improvement over the baseline

Limitations:

Significance:

Future work:

Related work:

Paper link: Reflexion: Language Agents with Verbal Reinforcement Learning

PranayPasula commented 9 months ago

Closed with merging of Reflexion_review_PranavGuru