This builts up upon following Issue and the corresponding method for providing the chat history:
The current Conversational RAG approach includes supporting documents with user queries, which can significantly increase token usage. I would like more granular control over what gets stored in the conversation history to optimize token efficiency. Specifically:
For past conversation turns, only store the user query without the supporting documents in the ChatMessageStore. This keeps the historical chat history concise and relevant.
For the most recent conversation turn, include the supporting documents from the RAG component to provide full context to the LLM, but do not store this entire concatenated message with the documents in the ChatMessageStore.
This approach would help optimize the use of the context window by reducing unnecessary token usage, while still maintaining a clear and concise conversation history for future responses.
Correct me if I'm wrong with some of these assumptions or if this is already possible. There probably already exists a workaround for this, but I think this could still be interesting.
This builts up upon following Issue and the corresponding method for providing the chat history:
The current Conversational RAG approach includes supporting documents with user queries, which can significantly increase token usage. I would like more granular control over what gets stored in the conversation history to optimize token efficiency. Specifically:
For past conversation turns, only store the user query without the supporting documents in the ChatMessageStore. This keeps the historical chat history concise and relevant.
For the most recent conversation turn, include the supporting documents from the RAG component to provide full context to the LLM, but do not store this entire concatenated message with the documents in the ChatMessageStore.
This approach would help optimize the use of the context window by reducing unnecessary token usage, while still maintaining a clear and concise conversation history for future responses.
Correct me if I'm wrong with some of these assumptions or if this is already possible. There probably already exists a workaround for this, but I think this could still be interesting.