TanGentleman / Augmenta

Automate RAG-powered workflows
MIT License
1 stars 0 forks source link

MultiVectorRetriever does not obey self.chunk_size #10

Closed TanGentleman closed 2 months ago

TanGentleman commented 2 months ago

Since the process follows like this:

These parent docs are prioritized to "page" correctly, like for a pdf. This may lead to unresolved context limits or unexpected behavior without better checks on the size of doc.page_content. Since it's the parent document excerpts getting passed to the LLM at the very end, we need to calculate character count and respect the token limits.