Closed charles-marion closed 2 months ago
With the older ConversationRetrievalChain, I limited how much history I would pass into a model. In my organization, we are seeing chat history messages grow across a variety of topics and it can cause inaccurate rephrasing of questions.
I subclassed ConversationBufferMemory to give a rolling window of conversation history that is a smaller subset of the entire history.
eg:
from langchain.memory import ConversationBufferMemory
from typing import Dict, List, Any
from pydantic import Field
class WindowedConversationBufferMemory(ConversationBufferMemory):
k: int = Field(default=2, description="Number of recent conversations to keep")
def __init__(self, **kwargs):
super().__init__(**kwargs)
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
# Save the full context to the underlying storage (DynamoDB)
super().save_context(inputs, outputs)
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
# Load the full history from the underlying storage
result = super().load_memory_variables(inputs)
# If there's no history, return an empty list or dict
if self.memory_key not in result or not result[self.memory_key]:
return {self.memory_key: [] if self.return_messages else ""}
# Windowing: Only return the last k conversations
if self.return_messages:
result[self.memory_key] = result[self.memory_key][-2*self.k:]
else:
conversations = result[self.memory_key].split('\n\nHuman: ')
recent_conversations = conversations[-min(self.k, len(conversations)):]
result[self.memory_key] = '\n\nHuman: '.join(recent_conversations).strip()
return result
I want to do something similar with RunnableWithMessageHistory but I'm still getting up to speed on this new API. Do you think that limiting the message history to a smaller slice of data is an important feature?
With the older ConversationRetrievalChain, I limited how much history I would pass into a model. In my organization, we are seeing chat history messages grow across a variety of topics and it can cause inaccurate rephrasing of questions.
I subclassed ConversationBufferMemory to give a rolling window of conversation history that is a smaller subset of the entire history.
eg:
from langchain.memory import ConversationBufferMemory from typing import Dict, List, Any from pydantic import Field class WindowedConversationBufferMemory(ConversationBufferMemory): k: int = Field(default=2, description="Number of recent conversations to keep") def __init__(self, **kwargs): super().__init__(**kwargs) def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None: # Save the full context to the underlying storage (DynamoDB) super().save_context(inputs, outputs) def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]: # Load the full history from the underlying storage result = super().load_memory_variables(inputs) # If there's no history, return an empty list or dict if self.memory_key not in result or not result[self.memory_key]: return {self.memory_key: [] if self.return_messages else ""} # Windowing: Only return the last k conversations if self.return_messages: result[self.memory_key] = result[self.memory_key][-2*self.k:] else: conversations = result[self.memory_key].split('\n\nHuman: ') recent_conversations = conversations[-min(self.k, len(conversations)):] result[self.memory_key] = '\n\nHuman: '.join(recent_conversations).strip() return result
I want to do something similar with RunnableWithMessageHistory but I'm still getting up to speed on this new API. Do you think that limiting the message history to a smaller slice of data is an important feature?
The memory used by RunnableWithMessageHistory
in this change
is this class https://github.com/aws-samples/aws-genai-llm-chatbot/blob/9de3e559a4d744aab2091290e008cd620d9cb5a2/lib/shared/layers/python-sdk/python/genai_core/langchain/chat_message_history.py#L48
To implement it I would just add a max messages returned parameter: https://github.com/aws-samples/aws-genai-llm-chatbot/blob/9de3e559a4d744aab2091290e008cd620d9cb5a2/lib/model-interfaces/langchain/functions/request-handler/adapters/base/base.py#L111 (because you still want to store/return the full history to view the session)
This would make this independent of the chain.
Do you think that limiting the message history to a smaller slice of data is an important feature? I do agree since it would reduce the number of token used but it would need to be configurable somewhere.
Issue #, if available:
502 #495 #230
Description of changes:
To add usage tracking to bedrock models, I migrated the langchain chain for the claude model (ConversationChain is deprecated)
Instead it uses RunnableWithMessageHistory with ChatBedrockConverse that relied on the Bedrock Converse API that is consistent across models and provide the usage in the response.
Changes
npm run vet-all
to quickly verify formatting and testsTesting
Future Change
Note: This change is modifying the prompts to match the new Langchain patterns. For example: Before
After
Image with metdata usage Dashboard
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.