Closed TanGentleman closed 7 months ago
What is the strategy for handling conflicts?
For instance:
One webpage has a List[Document] with one item, with a page_content string with a length of 120000 characters. I see no meaningful reason for a MultiVectorRetriever to be used here except for a use case where this is one document of many, and a summary for each proves useful. Currently, I was considering forcing it it to be arbitrarily split in a way that isn't useful, so I'd rather stick to using the chunk_size that the user provides.
In most cases, it seems like I should have checks before each LLM initialization to make sure that the context_size can reasonably handle what I'm throwing at it.
I added a reasonable check for contexts as part of the custom classes. Seems like it will be best to have a system where checks are performed at different stages, even if they are simply reporting a message to the user.
I need to:
Then, during the step where the Config object is being set (yes, very early!) check the value of k_excerpts * chunk_size to the context_size for the model.
Seems like:
should work for the comparison.