danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Anthropic, OpenAI, Assistants API, Azure, Groq, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.
https://librechat.ai/
MIT License
17.2k stars 2.87k forks source link

Enhancement: Adopt ConversationSummaryBufferMemory #741

Closed ywkim closed 11 months ago

ywkim commented 1 year ago

Contact Details

youngwook.kim@gmail.com

What features would you like to see added?

I propose the adoption of ConversationSummaryBufferMemory in LibreChat. This memory class combines the advantages of BufferMemory and ConversationSummaryMemory for managing the conversation history. It maintains a buffer of the most recent interactions and a summary of the previous conversation, enabling efficient memory usage and improving context awareness, particularly during long conversations. For more details on how ConversationSummaryBufferMemory works, you can refer to this article.

More details

I'd like to discuss the potential impact of this adoption on the existing system and how to best approach its implementation. Would be happy to contribute to this enhancement.

Which components are impacted by your request?

General, UI

Pictures

Not applicable for this request.

Code of Conduct

danny-avila commented 1 year ago

I've already made my own implementation of exactly that, summarizing the earliest messages while keeping the buffer for latest

It's not perfect and haven't gotten to testing it much, but it can be found here: api\app\clients\BaseClient.js, see refineMessages method.

In api\app\clients\OpenAIClient.js, this strategy is used when options.contextStrategy is set to refine.

My initial thought with this was to make this kind of thing optional/configurable. I'm very open to this being refactored however necessary. Perhaps the current ways can be bypassed altogether for use with langchain, as long as parameters like promptPrefix are kept intact.

A lot of the work is done already for easy integration tho, i would just look at the source code to see how langchain is summarizing/handling tokens to start and implement in refineMessages method, and if it's too much work, maybe a significant refactor of sendCompletion, buildMessages, or even sendMessage methods of BaseClient/OpenAIClient are in order

danny-avila commented 1 year ago

@ywkim lmk your thoughts on this as this would be great to have. I thought of simplifying the token counting, for incoming messages at least, via express.js middleware, could help offload the complexity from the client classes

danny-avila commented 1 year ago

will have to comb through the relevant langchain docs soon: https://github.com/hwchase17/langchainjs/blob/main/langchain/src/memory/summary_buffer.ts

I think my implementation could be much more flexible in allowing cached summaries for branched messages

danny-avila commented 1 year ago

@ywkim I was looking at the source code, and seems fairly easy to implement using the same classes LangChain uses. May work on this in the coming week.

ywkim commented 11 months ago

Hi @danny-avila,

I apologize for the late response, I was on vacation for a month. It seems a lot has happened during this time! I'm glad to see that you've started working on the introduction of ConversationSummaryBufferMemory that I suggested.

The code you've implemented seems to have made significant progress. I appreciate your efforts in enhancing the project and I'm looking forward to seeing its effects on LibreChat's functionality.

Thank you once again for considering my suggestion and for your continuous work on this project. If I have any more ideas or suggestions in the future, I'll be sure to share.

danny-avila commented 11 months ago

No worries, hope you had a great vacation!

Yes, I'm just about to wrap up the PR. LangChain makes it really easy to implement these pruning methods, but not for message branching, as it would waste more tokens than save, out-of-the-box.

A homebrewed solution was needed for this project, and I was able to use LangChain tools to help.

Thankful for your contributions and this suggestion!