We assume that a larger document size can make it harder for an LLM to extract each valuable piece of data. Therefore, we include a step to split the data into smaller pieces. Question: What is a reasonable number of pages or symbols we should have?
how to make sure we don't cut pages so that we don't interrupt important sentences?
splitting books or large pdfs into smaller pieces
We assume that a larger document size can make it harder for an LLM to extract each valuable piece of data. Therefore, we include a step to split the data into smaller pieces. Question: What is a reasonable number of pages or symbols we should have?
how to make sure we don't cut pages so that we don't interrupt important sentences?