The code in this repository is based on the paper Reward Design with Language Models.
This repository contains the prompts that we used for each domain as well as code to train an RL agent with an LLM in the loop using those prompts. Each domain (Ultimatum Game, Matrix Games, DealOrNoDeal) has a separate directory and will need a separate conda/virtual environment.
Please check out the READMEs in each directory for more information on how to run things.
We implemented a framework that utilizes LLMs to assist Reinforcement Learning (RL) agents in better simulating human driving behavior. First, we define the desired driving style and output format and input them into the agent. Then, we describe the current scenario the ego car faces to the LLM. Except for the first round, in each subsequent round, we provide the LLM with the decision output from the previous round, enabling better context understanding. By comparing the LLM's output decision with the decision made by the RL agent, we generate a reward function to train the LLM-RL agent.
pip install -r requirements.txt
Our framework integrates a Language Model (LM) to process textual prompts as input and generate a reward signal. The input prompt consists of three primary components: the Task Description, the User Objective, and the Last Outcome. Specifically, the LM functions by accepting a concatenated version of these components and returns a textual string. This output is then analyzed by a parser, which translates the text into a binary reward signal. The binary reward signal is subsequently utilized to train a Reinforcement Learning (RL) agent. By employing the LM as a proxy for the traditional reward function, our framework is compatible with various RL training algorithms, enhancing adaptability and potential applications.
A prompt for one interaction consists of the last action, the predefined prompt, and the scene description. This description includes available actions, a general overview, and a safety evaluation.
An example of conservative model:
An example of aggressive model: