Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
23.58k stars 2.38k forks source link

[FEAT]: Add Optional Small-to-Big Retrieval #1387

Open cope opened 4 months ago

cope commented 4 months ago

What would you like to see?

Apparently, smaller chunk sizes improve retrieval quality, but larger chunk sizes improve generation quality: Advanced RAG 01: Small-to-Big Retrieval.

If the current embediing process stores the relative chunk ids per document, then when chunk i is retrieved, we can prepend chunks [i-2, i-1] and append chunks [i+1, i+2] and pass on that big combined text to the generation step. This would have both benefits: smaller chunks for retrieval and larger chunks for generation. Naturally, we need to make sure that any i+/-n chunk exists before adding null.

My idea is to simplify the implementation by just adding optional prepend/append integers that would default to 0, but could be changed by the user in the settings.

The alternative is to do full Parent Document Retriever, but this is a much bigger task IMHO.

RahSwe commented 4 months ago

Parent Document Retriever would be a nice option for documents (like the pinning option). Due to mixes of documents in which some are too big to be retrieved as parent.