[FEAT]: Add Optional Small-to-Big Retrieval

What would you like to see?

Apparently, smaller chunk sizes improve retrieval quality, but larger chunk sizes improve generation quality: Advanced RAG 01: Small-to-Big Retrieval.

If the current embediing process stores the relative chunk ids per document, then when chunk i is retrieved, we can prepend chunks [i-2, i-1] and append chunks [i+1, i+2] and pass on that big combined text to the generation step. This would have both benefits: smaller chunks for retrieval and larger chunks for generation. Naturally, we need to make sure that any i+/-n chunk exists before adding null.

My idea is to simplify the implementation by just adding optional prepend/append integers that would default to 0, but could be changed by the user in the settings.

The alternative is to do full Parent Document Retriever, but this is a much bigger task IMHO.

Mintplex-Labs / anything-llm

[FEAT]: Add Optional Small-to-Big Retrieval #1387

What would you like to see?