agnaistic / agnai

AI Agnostic (Multi-user and Multi-bot) Chat with Fictional Characters. Designed with scale in mind.
https://agnai.chat
GNU Affero General Public License v3.0
508 stars 101 forks source link

Proposition: Middlewares #114

Open acidbubbles opened 1 year ago

acidbubbles commented 1 year ago

I want to make a more structured proposition for middlewares. This is a follow up on the Discord conversation.

The idea

I propose a middleware system, where messages are sent down a chain of middlewares, send to the AI backend, and returned through the middlewares. This would allow middlewares to intercept, hijack, or transform prompts and responses, as well as store additional data about the conversation. Middlewares could also extend responses to add information for the UI to process (for visual novel like characters, or text emphasis, etc.).

Implementation

image

I yet have to look at the code in more details, but the general idea is pretty simple. A middleware would simply be a class that contains a "next" function to call the next middleware, or the final one which is responsible for calling the AI backend.

interface Middleware {
  invoke(prompt: PromptContext, next:MiddlewareFunction) =>ResponseContext
}

where PromptContext could look like

interface PromptContext {
  GenPreset preset;
  ChatHistory history;
  Character character;
  string prompt;
  extensions: Dictionary;
}

and ResponseContext would be the same, but with message instead of prompt.

(NOTE: I know this returns streams, not strings, but the idea is the same, except some middlewares would need to buffer the stream, some others would be able to run on the stream directly)

Use cases

Why an issue for this

Because I want to try some crazy ideas like the summarize one, and I think this would make experimentations easier.

I'd also love to make a Phoenix Wright like text that can be influenced by the mood of the text, which would rely on the ability to run additional AIs on the pipeline, without necessarily making a full-blown feature right away.

This is all just theory, and I feel like this might be a much larger bite than I can chew. But I'd love to know what you think, if you had something similar in mind, and if not, whether experiments like this could be helpful to try.

Note that I also see that the adapters are "hardcoded", it might be a good opportunity to "pluginify" middlewares and backends together, at least structurally.

The end goal would be for the LuminAI middlewares (at least what I understood) to be simply pro and post processors as stateless services, so it's not only about experimentation though that's my personal objective.

acidbubbles commented 1 year ago

Update: I just don't see how streaming could work, since most of the middlewares would need to buffer the response anyway if it needs to call something by HTTP; and if the emotion / state of the character is going to drive things like the avatar or the text style, it should (I think) be known before the text shows up.

So, long story short, will text streaming survive LuminAI improvements?