cpacker / MemGPT

Create LLM agents with long-term memory and custom tools 📚🦙
https://memgpt.readme.io
Apache License 2.0
11.38k stars 1.24k forks source link

Feature - server secure tokens with Auth headers and jwt #1260

Open norton120 opened 5 months ago

norton120 commented 5 months ago

Is your feature request related to a problem? Please describe. Currently the server auth endpoint has a TODO for the expected auth flow. I'd like to get that wrapped to use jwt auth flow. Also I noticed that the current api keys appear to be stored and looked up as plaintext, which is dangerous and really shouldn't be used in production.

Describe the solution you'd like

Describe alternatives you've considered Jwt makes the most sense in the context of the chat use case - lots of fast calls and polling, rather than do the full description every trip and expose the api key every time. Alternatively this could be 2 PRs, but I think at the speed of the project's evolution and how they are both really needed to make auth polished, one makes sense.

Additional context Add any other context or screenshots about the feature request here. @sarahwooders I want to go ahead and PR this but I couldn't figure out which issue it belongs to, so I opened a new one. Am I cool to dig into this next?

cpacker commented 4 months ago

@norton120 to summarize what would the general MemGPT API request flow look like for users?

The initial "TODO" scaffolded auth was meant to allow running a service that has auth similar to OpenAI, just without any real auth.

So basically, instead of being the OpenAI server admin, you're the MemGPT server admin.

On the server admin side:

  1. you spin up a server instance
  2. that server instance has some admin account which you can use to view/edit users

On the user side:

  1. you can create an account (missing on MemGPT server, currently must be done by admin)
  2. you can go to an API keys website (missing on the MemGPT dev portal atm, but API routes exist) and generate an API key
  3. now when you make your own requests to the server, you use that API key as a bearer token

^what part of this would change (if any) in the proposed PR?

Also, we're quite open to changing the general API key / auth / user-admin flow, we just thought it made sense to mimic OAI since ideally you can run MemGPT as an OAI replacement.

Really awesome contributions btw, thank you!

Also happy to chat at higher cadence on discord, feel free to DM us there too (github issues also works totally fine, up to you).

norton120 commented 4 months ago

@cpacker thanks!

My thought was to have 2 valid auth flows - one with the API key as the Auth header (so pretty much a drop-in for OpenAI the way you described) and one flow that uses a JWT handshake. The API key flow would stay the same as described above, except new keys (maybe prefixed with sks- for "secret key secure?") would be slightly longer than the current keys. Otherwise business as usual.

The JWT flow would get new routes, and the user experience would look something like this:

Why the JWT?

If this makes sense, I'll start to sketch out some example code so it's easier to discuss. I'll also catch up with you in Discord this week if you're around; I have been in the process of moving for the last week, so I've been away from the keyboard more than usual.

cpacker commented 4 months ago

@norton120 that makes sense to me, thanks for replying in such detail! Re: the pub-sub flow part (somewhat orthogonal to the whole jwt / bearer token thing) - how does this relate to POST SSE vs websockets vs ...? Would the polling be GET SSE requests (if streaming is enabled), and just regular GET otherwise? I guess in the context of a pub-sub polling setup maybe GET SSE doesn't even make that much sense to support (vs GET returning eg {'status': 'in_progress'})

norton120 commented 4 months ago

The more I think about it, the more I think websockets or SSE support is probably all that's needed for now; my original line of thought was that an application publishing/consuming messages to the MemGPT server with a JWT-scoped topic would make pre/post-processing on messages easier, but I don't think it will make a meaningful difference to the downstream implementation at this point. There's a little more overhead for developers not used to working with websockets, but if we've got decent example docs that shouldn't be a barrier to entry.

So maybe this PR would be to square away the auth (secure token and JWT), support those auth methods for ws/SSE, and document usage?

@cpacker thank you for being a sounding board here - I could be convinced either way, I liked the super low barrier to entry of standard REST polling but this feels more and more YAGNI