CassioML / cassio

A framework-agnostic Python library to seamlessly integrate Cassandra with ML/LLM/genAI workloads
Apache License 2.0
103 stars 18 forks source link

[LangChain] Plans for good implementation of LangChain's "semantic chat memory" #27

Closed hemidactylus closed 1 year ago

hemidactylus commented 1 year ago

LangChain has no specific "semantic chat memory": that stems, instead, from a certain usage of the VectorStore.

(see here on cassio.org and here for a howto on LangChain site).

Changes needed, the rationale

In practice, once you have a vectorstore, first a "retriever" is created out of it (langchain standard construct) and then the latter is wrapped by a VectorStoreRetrieverMemory class (another langchain standard). Relevant steps:

vectorstore = whatever-your-backend.init(...)
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))
memory = VectorStoreRetrieverMemory(retriever=retriever)
# now "memory" can be used e.g. in a chat

So in a realistic usage you don't want to pull relevant chat snippets from the whole store, rather from the conversation with that user of course. In Cassandra terms, this means clustering rows by user_id.

CassIO

Hence we need a parameter in CassIO's VectorTable init that controls whether we have a primary key (( document_id )) or (( session_id) , document_id) in Cassandra terminology. This is not implemented yet: at the moment we only have the first choice and no control.

(Note: I assume we don't want to have a different table per user id !)

LangChain

Once the above is addressed, LangChain also will have to slightly change.

Option 1: new params in the vector store's similarity_search_with_score_id_by_vector

The search_kwargs parameter in spawning the "retriever" will be the place to specify the user_id (i.e. session_id, i.e. partition to use for the subsequent lookup). These end up in the kwargs of the similarity_search_with_score_id_by_vector method of the Cassandra Vector Store, which will be able to pass this partition key to the cassIO search.

Pro: less proliferation of instances of vector store. Con: might involve more kwargs as this param gets to the Cassandra vector store through several routes (whether mmr, similarity, etc it's different functions being called. See as_retriever method of base VectorStore class.

Option 2:

In this case one creates as many VectorStore classes as there are session_ids, each with the partition key as instance property, and this gets injected into each search() call within that instance. Much less intrusive, a bit heavier resource-wise perhaps.

karlunho-datastax commented 1 year ago

So it seems that for the regular chat history (not based on Vector Search), does it have filtering by customer ID ?

Since we are thinking about refactoring history, the LLM history does not necessarily need to be just chat history, it is basically anything that the LLM is outputting. Should we have additional keys in the partition key like "conversation_id" or "day of week" ?

hemidactylus commented 1 year ago

This is how I see it: on the cassIO side, there is an agnostic "partitioning" of some kind (in fact, the column in the actual table is called partition_id). Whether this is used to implement per-user separation or other kinds of clustering of rows is up to the code that uses the cassIO table abstraction. So the cassIO side is now closed with the addition of the ClusteredMixin to the base table, and on the Lc integration side there will be a certain usage of this general facility for the semantic chat memory (to be addressed in a PR to langchain).

So this is now closed with #70 .