Open acantarero opened 11 months ago
I agree that we must make it configurable to chose which metrics use.
But I disagree that we could let users use a vector search similarity function (like cosine similarity) for ensuring "diversity" in BM25. The documents have been already retrieved from the vector database as the closest according to the same function, so using the function won't help in reducing redundancy on the set of documents sent to the LLM in the prompt.
One of the main benefits of LangStream, thanks to its asynchronous nature, is that it makes it easy to perform preprocessing before storing the text on the vector database (we already have a a few agents that help with a good configuration out-of-the-box)
@acantarero do you have some proposal of other metrics to use ?
Background
Suggestion
We have already implemented bm25 and cosine similarity. Allow users to select which similarity method they want to use (with reasonable defaults).