The vector database stores these shortened docs, and aliases them to the parent documents
These parent docs are prioritized to "page" correctly, like for a pdf. This may lead to unresolved context limits or unexpected behavior without better checks on the size of doc.page_content. Since it's the parent document excerpts getting passed to the LLM at the very end, we need to calculate character count and respect the token limits.
Since the process follows like this:
These parent docs are prioritized to "page" correctly, like for a pdf. This may lead to unresolved context limits or unexpected behavior without better checks on the size of doc.page_content. Since it's the parent document excerpts getting passed to the LLM at the very end, we need to calculate character count and respect the token limits.