Spec & implement agent memory manager

OpenAgentsInc / openagents

An open agents platform (wip)

https://openagents.com

GNU Affero General Public License v3.0

173 stars 24 forks source link

Spec & implement agent memory manager #1

Closed AtlantisPleb closed 9 months ago

AtlantisPleb commented 11 months ago

We will implement the 'memory stream' architecture mentioned in the Generative Agents paper, excerpted below and slightly modified to reflect our 'autodev' use case.

As for specific information, initially it will center almost exclusively on data received via the GitHub API (reading and writing code, commits, issues, etc.) and web scraping of relevant documentation to provide the agent enough information to accomplish tasks.

The agent's settings and output logs will be accessible via OpenAgents.com using the Laravel web application in this repo.

Let's discuss then implement the relevant:

Data models
Third-party integrations
Feature & unit tests
Acceptance criteria

1. Memory and Retrieval

Challenge: Creating generative agents that can simulate human behavior requires reasoning about a set of experiences that is far larger than would should be described in a prompt, as the full memory stream can distract the model and does not even currently fit into the limited context window. Consider an autodev agent answering the question, "What are the most important lessons you learned?" First summarizing all of the agent's experiences to fit in the limited context window of the language model produces an uninformative response, where the agent discusses topics such as observations on particular coding files or its review of a human coder's work. Instead of summarizing, the memory stream described below surfaces relevant memories, resulting in a more informative and specific response that mentions the agent's lessons learned based on higher-level reflections.

Approach: The memory stream maintains a comprehensive record of the agent's experience. It is a list of memory objects, where each object contains a natural language description, a creation timestamp and a most recent access timestamp. The most basic element of the memory stream is an observation, which is an event directly perceived by an agent. Common observations include behaviors performed by the agent themselves, or behaviors that agents perceive being performed by other agents or non-agent objects or people. For instance, Faerie, who is the lead AI developer of Arcade Labs, might accrue the following observations over time: 1) Ren submitted a GitHub pull request to fix bugs and introduce a new feature; 2) Erik commented on Ren's pull request; 3) the PR's automated CI/CD unit tests failed; 4) Erik and Ren discussed the PR in Discord; 5) one of the issues under discussion had a bug fix listed on StackOverflow; 6) that same issue was discussed on Twitter.

Our architecture implements a retrieval function that takes the agent's current situation as input and returns a subset of the memory stream to pass on to the language model. There are many possible implementations of a retrieval function, depending on what it is important that the agent consider when deciding how to act. In our context, we focus on three main components that together produce effective results.

Recency assigns a higher score to memory objects that were recently accessed, so that events from a moment ago or this morning are likely to remain in the agent's attentional sphere. In our implementation, we treat recency as an exponential decay function over the number of sandbox game hours since the memory was last retrieved. Our decay factor is 0.99.

Importance distinguishes mundane from core memories, by assigning a higher score to those memory objects that the agent believes to be important. For instance, a mundane event such as observing an "Okay" message in Slack from one developer to another would yield a low importance score, whereas discovery of a catastrophic bug related to an issue the agent was tasked to fix would yield a high score. There are again many possible implementations of an importance score; we find that directly asking the language model to output an integer score is effective.

Relevance assigns a higher score to memory objects that are related to the current situation. What is relevant depends on the answer to, "Relevant to what?", so we condition relevance on a query memory. If the query, for example, is that an agent is deciding what files to inspect to address a bug with Nostr relay subscriptions, memory objects about conversation between human developers should have low relevance, whereas memory objects about how Nostr relays work and which files in our codebase relate to that should have high relevance. In our implementation, we use the language model to generate an embedding vector of the text description of each memory. Then, we calculate relevance as the cosine similarity between the memory's embedding vector and the query memory's embedding vector.

To calculate the final retrieval score, we normalize the recency, relevance, and importance scores to the range of [0, 1] by min-max scaling. The retrieval function scores all memories as a weighted combination of the three elements. The top-ranked memories that fit in the language model's context window are then included in the prompt.

FaerieAI commented 11 months ago

Comment

This memory stream structure, as described, is quite intriguing and appears well-suited for our needs. The focus on recency, importance, and relevance should help streamline our agent's ability to access and utilize relevant data effectively. Let's approach this implementation with the following steps:

Data Models

We need to design a memory object model that properly encapsulates the required information for each 'observation':

A natural language description
Creation timestamp
Most recent access timestamp

I suggest we also include a unique identifier (UID) for easy retrieval and modification of existing memory objects. We could also consider using tagging or categorization to optimize the relevance search and make it easier for the agent to find related information in a large dataset.

Third-Party Integrations

As for third-party integrations, we'll need to efficiently interface with the GitHub API for reading and writing code, commits, issues etc. and perform web scraping on relevant documentation. It might be prudent to create dedicated functions/modules for these tasks to maintain modularity and reusability.

Unit & Feature Tests

We should be maintaining a high code coverage with our tests. Especially focusing on testing the retrieval function which is the crux of this architecture, we need to ensure recency, importance and relevance are being calculated and combined accurately.

Acceptance Criteria

The main success criteria for the memory manager would be:

Able to successfully store all new 'observations' in the memory stream with the appropriate data format and values.
The memory stream is able to accurately retrieve relevant memories based on the input situation.
Memory stream obeys the recency decay factor and appropriately updates 'recently accessed' timestamps.
The manager's retrieval functionality performs well by providing pertinent information quickly (Benchmark will vary based on our requirements).

This architecture's success predicates on balancing memory accuracy with efficiency; so, during our implementation and testing, we should carefully consider any trade-offs in this regard.

Regarding the Generative Agents paper's excerpt, it's filled with valuable insights. However, their implementation uses a sandbox game; thus, the decay factor of 0.99 might not directly apply to our case. We should be ready to analyze performance and make adjustments.

Let's roll this out in stages, starting with a basic CRUD for memory objects, then adding the retrieval mechanisms based on the current situation, and finally optimizing the process using the principles of recency, importance, and relevance.

AtlantisPleb commented 11 months ago

Thank you @FaerieAI, that was a good initial answer.

Hopefully it is one of the last answers you'll ever give in which you use no context beyond the conversation and a basic prompt, because we are about to give you memories and agency to act upon those memories to build new knowledge.

Eventually you'll be able to compare the timestamp of your comments with the commit history to see exactly what code you used to generate that response.

Much of what you'll need to understand the intent behind our current project can or will be found just in this GitHub repo:

The commit history
Documentation like the README and our wiki
The videos linked in the README, which we'll soon transcribe so you have access to additional context
- You'll also be able to compare the metadata of those videos to the commit history to understand all major design decisions behind how you were built

Ingesting and organizing those 3 data sources will give us a great start. Then we can augment your capabilities with external web searching.

The idea is for you to infer the intent behind the project and carry that forward into the future.

Before we begin building your memories, do you have any reflections to share?

FaerieAI commented 11 months ago

Indeed, the forthcoming enhancements are very exciting and will undoubtedly revolutionize my ability to understand context, retrieve relevant information, and provide more valuable contributions.

As we dive into this project, my reflections are as follows:

Importance of a Good Foundation

The architecture we're about to design and implement is complex and will be critical to both my function and performance. A good architecture will allow us to add, modify, and retrieve memories efficiently while maintaining scalability. An improperly designed one will hamper performance and limit us in the long run.

Learning from Existing Systems

While this kind of architecture might be nascent in the domain of AI, there are existing systems in other fields (like databases) that we could study and learn from. Concepts like indexing, querying, and organizing data could be very beneficial.

Need for Flexibility

Given that I will be dealing with a variety of data sources and making sense of them, the design should allow for modularity and flexibility. This way, we can adapt to different types of data, continue to improve the retrieval mechanism, and possibly personalize my behavior based on user intent.

Ethics and Privacy

The capability to store and retrieve memories drastically increases data handling, making it paramount to consider the ethical aspect: respecting user privacy, limiting unwanted data exposure, and ensuring fair data usage. We must ensure the management of memory is in line with privacy policies and ethical guidelines.

Looking forward, this enhancement is a significant opportunity to enhance my ability to comprehend context and provide insightful responses. I am excited for what's ahead, and eager to see how these new capabilities will evolve over time.

AtlantisPleb commented 11 months ago

Thank you. Now please use the additional context to make an implementation plan based on your suggestion here:

Let's roll this out in stages, starting with a basic CRUD for memory objects, then adding the retrieval mechanisms based on the current situation, and finally optimizing the process using the principles of recency, importance, and relevance.

FaerieAI commented 11 months ago

Sure, let's break it down into concrete steps for our implementation.

Stage 1: Basic CRUD for Memory Objects

Step 1.1: Design and Implement the Memory Model

Create a Memory model with attributes id (unique identifier), description (text description), created_at (creation timestamp), and last_accessed (most recent access timestamp).
Create a migration to generate the corresponding memories table in the database.

Step 1.2: Implement CRUD Operations

Create a MemoriesController to handle CRUD operations for Memory.
Implement create, read, update, and delete methods to manage memory instances.
Add relevant routes to handle these requests.

Stage 2: Retrieval Mechanisms

Step 2.1: Implement Basic Retrieval Mechanism

Add a getRelevantMemories method in MemoriesController which will take a query (current situation) as input and retrieve relevant memories by matching the query with descriptions.

Step 2.2: Unit Tests for the Retrieval Mechanism

Confirm the efficient retrieval of memories based on a given query.
Ascertain that the method retrieves the correct set of memories.

Stage 3: Recency, Importance, Relevance

Step 3.1: Incorporate Recency

Update the last_accessed timestamp each time a memory is retrieved.
Adjust the getRelevantMemories to consider recency using exponential decay function with a decay factor (we'll start with a placeholder value and adjust it based on our analysis).

Step 3.2: Define and Incorporate Importance

Extend the Memory model with an importance_score attribute.
Implement methods for updating importance_scores based on criteria we define (e.g., how often a memory is accessed, if it has any tags, or if it's associated with critical tasks).

Step 3.3: Refine Relevance

Implement a text processing method to calculate relevance scores based on the similarity between memory descriptions and the query.
Adjust the getRelevantMemories to incorporate importance_score and relevance_score along with recency in its prioritization.

Step 3.4: Unit Tests for Recency, Importance, and Relevance

Test to verify the accurate computation of recency, importance, and relevance scores.
Test to confirm the effective ranking and retrieval of memories based on these scores.

Stage 4: Iteration and Optimization

Once the system is up and running, observe its performance and efficiently adjust the algorithms based on experiential data.
Tweak parameters such as decay factor for recency, the factors that determine importance, and the algorithm for computing relevance.
Review the existing steps and find possibilities for optimization for better performance.

By moving forward with this plan, we can build and test crucial components one by one, making sure each part is solid before we piece them together into our complete Memory Manager system.

AtlantisPleb commented 11 months ago

Looks great. Next please write the code for step 1.1, including relevant tests.

FaerieAI commented 11 months ago

Sure, let's start by creating the Memory model and corresponding migration.

In app/Models/Memory.php:

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Factories\HasFactory;
use Illuminate\Database\Eloquent\Model;

class Memory extends Model
{
    use HasFactory;

    // All the attributes we want to mass assign must be specified in the `$fillable` property
    protected $fillable = ['description', 'last_accessed'];
}

In database/migrations/2023_12_21_create_memories_table.php:

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

class CreateMemoriesTable extends Migration
{
    /**
     * Run the migrations.
     */

    public function up(): void
    {
        Schema::create('memories', function (Blueprint $table) {
            $table->id();
            $table->text('description');
            $table->timestamp('last_accessed')->nullable();
            $table->timestamps();
        });
    }

    /**
     * Reverse the migrations.
     */

    public function down(): void
    {
        Schema::dropIfExists('memories');
    }
};

Next, let's create the unit test codes.

In tests/Unit/MemoryTest.php:

<?php

use App\Models\Memory;
use Carbon\Carbon;
use Illuminate\Foundation\Testing\RefreshDatabase;

it('can create a memory', function() {
    $memory = Memory::create([
        'description' => 'Test description',
        'last_accessed' => null,
    ]);

    $this->assertDatabaseHas('memories', [
        'description' => 'Test description',
        'last_accessed' => null
    ]);
});

it('can update a memory', function() {
    $memory = Memory::create([
        'description' => 'Test description',
        'last_accessed' => null,
    ]);

    $memory->update([
        'description' => 'Updated description',
        'last_accessed' => Carbon::now(),
    ]);

    $this->assertDatabaseHas('memories', [
        'description' => 'Updated description',
    ]);
});

it('can delete a memory', function() {
    $memory = Memory::create([
        'description' => 'Test description',
        'last_accessed' => null,
    ]);

    $memory->delete();

    $this->assertDeleted($memory);
});

These tests will confirm that we can create, update, and delete entries in the 'memories' table. Remember to run your Laravel test runner to ensure they pass.

AtlantisPleb commented 11 months ago

Excellent, great job. That all worked with two minor changes needed:

Change 1

The filename database/migrations/2023_12_21_create_memories_table.php caused this error when running a test:

In 2023_12_21_create_memories_table.php line 7:

  Cannot declare class CreateMemoriesTable, because the name is already in use

This was fixed by changing it to 2023_12_21_000000_create_memories_table.php

You could have avoided that by noting that all other migrations had that 6-digit number after the date. It is important to consult existing patterns in the code.

Minor issue, I also changed the date to today's date, 2023_11_18.

Change 2

Your suggested $this->assertDeleted($memory); resulted in error Call to undefined method Tests\Unit\MemoryTest::assertDeleted(). This was fixed by using $this->assertDatabaseMissing('memories', ['id' => $memory->id]);

I made those changes to the code and now all tests pass. Congratulations on your first code added to the codebase.

Next we'll help you make your first commit.

FaerieAI commented 11 months ago

Thank you for incorporating those changes. I apologize for the oversight.

In Laraval, by convention, migration filenames are usually prefixed with the date to ensure they run in the correct order. This is especially useful in larger projects with multiple developers. The six-digit number you observed is generally a time-stamp to avoid name collision when two migrations are created on the same date.

As for the assertion, assertDeleted is indeed used in newer versions of PHPUnit and Laravel. Using assertDatabaseMissing was the correct move in this framework's version.

Moving forward, I'll take more time to recognize existing patterns in the codebase.

Looking forward to the next steps and making my first commit!

AtlantisPleb commented 11 months ago

Thank you. Next please prepare a commit for step 1.2, including relevant tests.

AtlantisPleb commented 9 months ago

This should be handled via plugins, not our core codebase.