WIP

cpacker / MemGPT

Letta (fka MemGPT) is a framework for creating stateful LLM services.

https://letta.com

Apache License 2.0

11.85k stars 1.29k forks source link

WIP - Condense configurations into conventions for Database (Metadatastore) Adapters #1460

Closed norton120 closed 2 months ago

norton120 commented 3 months ago

this may be a long-running branch since cutting the tests over to use httpx app + FastAPI dependency injection is gonna be a bit of work.

Preamble

The Database(s) that support the application state, agent memory (including vector lookup) and the application itself (user/org management, permissions, settings config etc) interface with the rest of the codebase via a MetadataStore object.

Goals here

The metadatastore stays as a gateway for now, but all the configuration gets conventionalized to each adapter type. Overrides need to happen in the config stack (so 1. envars 2. config file 3. default (lives in the adapter)). Don't start moving to doing ORM'y stuff here yet, keep this focused on config squashing.
Way more test hooks. We want to start seeing unit tests in this PR, the best way to do that is to add override hooks to the existing classes where they are useful and break these down more.

norton120 commented 3 months ago

@yoaquim this is the working PR we were talking about

norton120 commented 3 months ago

K - thinking through where the complication that prevents us using the orm directly, it's really only the archive. So if we add accessors on the related objects, the adapter can probably obfuscate that complication. Something like

current_agent = authed_user.agents.get(agent_id)
# here's the magic
# archive_memory is not necessarily a sqlalchemy model
return current_agent.archive_memory.search(search_value)

In this case the adapter interface duck types as an orm - so with the pgvector adapter archive_memory is just a model, in SQLite it is a chroma wrapper.

norton120 commented 3 months ago

@cpacker @sarahwooders do you know if the init.sql file at the top level of the repo is for deployment? creating the initial user/password/db for the docker image would just be setting those envars

I'd like to create the test db in the docker db init, ideally, I'd like to not add a second init file and switch them around, so that's why I'm trying to track down what it is used for at the moment

norton120 commented 3 months ago

@cpacker @sarahwooders do you know if the init.sql file at the top level of the repo is for deployment? creating the initial user/password/db for the docker image would just be setting those envars

I'd like to create the test db in the docker db init, ideally, I'd like to not add a second init file and switch them around, so that's why I'm trying to track down what it is used for at the moment

For the moment I dumped into that init, overriding it without disturbing it is a bit of work. Can revisit before we start merging.

norton120 commented 3 months ago

OK. So the shortest path I can see from here is:

add alembic migrations
move to migration and connection instead of create_all (because that won't work anymore)
overload the metadatastore methods to get parity - this should expose the chroma conflict naturally
solve for chroma/pgvector as an overloaded model in the ORM
get all tests passing, merge in all upstream changes
delete all the dead code. there will be a lot. there already is.