Quansight / ragna

RAG orchestration framework ⛵️
https://ragna.chat
BSD 3-Clause "New" or "Revised" License
167 stars 21 forks source link

Chat history #421

Open pmeier opened 1 month ago

pmeier commented 1 month ago

Feature description

Currently each prompt passed to Chat.answer will be independent of the chat history. Meaning, although it is presented as one chat, each prompt-answer-pair is independent from each other. This can cause some confusion as users might reference a previous prompt or answer will get a nonsensical answer given that the assistant is not aware of them.

To overcome this, I propose we add a chat history feature that can be toggled on and off. I'm not sure on what level this should apply:

  1. On a global level, i.e. being passed to ragna.Rag() or set in the configuration
  2. On a chat level, i.e. being passed to Rag().chat()
  3. On a message level, i.e. being passed to Chat().answer()

I prefer 2. here, but I'm open to other opinions.

I'm not sure about the interface either. Naively, I would say that instead of passing the prompt to the assistant

https://github.com/Quansight/ragna/blob/84cf4f627ebf52b061681a7a8b106daef3e79a1d/ragna/core/_components.py#L216

We could also pass a list of messages here. The current case would have only a single item in there, whereas with chat history we would have multiple entries. If we go for this implementation, we also need to decide whether we pass strs or ragna.core.Messages and whether we pass a flat list or a list of prompt-answer-pairs. Let's survey the ecosystem and not reinvent the wheel here.

Value and/or benefit

Less user confusion and actual conversations rather than independent Q/A.

Anything else?

To pass historical messages to the assistant, we already have access to the Chat._messages attribute

https://github.com/Quansight/ragna/blob/84cf4f627ebf52b061681a7a8b106daef3e79a1d/ragna/core/_rag.py#L168

However, on the API side we never bothered to fill this up when loading a chat from the DB, since the chat so far never accesses the historical messages:

https://github.com/Quansight/ragna/blob/84cf4f627ebf52b061681a7a8b106daef3e79a1d/ragna/deploy/_api/core.py#L236-L240

blakerosenthal commented 1 month ago

I lean towards option 2 as well: history at the chat level, maybe to default to True for all chats via the persistent config or during startup at the CLI level. Chat histories could be loaded in bulk from the DB and toggled on a per-chat basis via the UI.

blakerosenthal commented 1 month ago

It looks like Langchain creates a special input type that is a combination of prompt + message_history, then builds a history-aware retriever that further rephrases the prompt using the history before sending everything to the LLM.

https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/#adding-chat-history

blakerosenthal commented 1 month ago

The current case would have only a single item in there, whereas with chat history we would have multiple entries. If we go for this implementation, we also need to decide whether we pass strs or ragna.core.Messages and whether we pass a flat list or a list of prompt-answer-pairs. Let's survey the ecosystem and not reinvent the wheel here.

To model things similarly to Langchain, we would replace the prompt arg with a more generic input that would take a newly-defined class instance (i.e. LlmInput). This new class would then handle things like rephrasing (a later feature, perhaps), and formatting the chat history alongside the prompt.

pmeier commented 1 month ago

history at the chat level, maybe to default to True for all chats via the persistent config or during startup at the CLI level.

Aren't the start and end of the sentence contradict each other? If we have the history at the chat level, we cannot have it globally set either through the config file or CLI option.

When doing it on the chat level, we can have a boolean parameter that gets passed during chat creation regardless if that happens at the Python or REST API or the web UI. Defaulting to True is ok for me.

This new class would then handle things like rephrasing (a later feature, perhaps), and formatting the chat history alongside the prompt.

I dislike this approach somewhat. Right now we have components that process data objects, e.g. a source storage gets passed documents and returns sources. The data objects don't have any processing logic on them. IMO this makes it a lot easier to keep the model of a pipeline. Thus, I'd prefer that functionality like rephrasing becomes a component rather than a method on a data object.

Re formatting: I don't think this can even be on the LlmInput class unless we create a subclass of it for each LLM we have. The format that each LLM accepts varies wildly between vendors. Meaning, we should have an abstract object that the LLM translates into its own dialect. And by "abstract object" I don't necessarily mean a custom object. It might still just be a list[str] or the like.

blakerosenthal commented 1 month ago

Aren't the start and end of the sentence contradict each other? If we have the history at the chat level, we cannot have it globally set either through the config file or CLI option.

Maybe I have to look more closely at how the global config interacts with the system. I meant that default history behavior is defined globally (i.e. the user can decide whether chat history is on or off for all new chats), but can be toggled per-chat.

I'd prefer that functionality like rephrasing becomes a component rather than a method on a data object.

This makes a lot of sense to me. Keeping the new Input class as a pure data type would still help abstract away the details from rest of the pipeline, and then a rephrasing component can process these pieces elsewhere.

The format that each LLM accepts varies wildly between vendors. Meaning, we should have an abstract object that the LLM translates into its own dialect.

I'm pretty sure this is how Langchain handles it too.

nenb commented 1 month ago

To model things similarly to Langchain, we would replace the prompt arg with a more generic input that would take a newly-defined class instance (i.e. LlmInput).

I like this, excellent research @blakerosenthal.

Abstracting the prompt like this would also allow us to potentially tackle another (small, but important) problem that has come up - being able to use different system prompts with the same assistant. I am currently doing this by creating multiple assistants, which feels unnecessary.

@pmeier Do we have consensus for adding a new data model LlmInput? If so, what would be the fields that we add on this data model eg user_prompt, system_prompt, chat_history, preprocessed_user_prompt etc?

I propose we add a chat history feature that can be toggled on and off Perhaps we don't even need to add this - as long as we make the messages available to the Assistant, a user can decide whether or not to make use of them when interacting with the LLM.

@pmeier The rest of your proposal regarding implementation seems sensible to me - make use of the _message attribute (perhaps now including it as part of the more general data model LlmInput) and make sure that the API also supports this.

Excited about this!

dillonroach commented 1 month ago

I'll just keep my comments at the higher LLM-interaction level as I have less opinion about how ragna should do it specifically, but with that said: