A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
When modifying the DEFAULT_OVERLAP_PERCENT from 10 percent to 80 percent, the overlap calculation does not seem to work correctly. We can observe that when we increase the overlap, we get fewer indexed documents. The culprit is an erroneous calculation of percentages:
This issue is for a: (mark with an
x
)Minimal steps to reproduce
When modifying the
DEFAULT_OVERLAP_PERCENT
from 10 percent to 80 percent, the overlap calculation does not seem to work correctly. We can observe that when we increase the overlap, we get fewer indexed documents. The culprit is an erroneous calculation of percentages:https://github.com/Azure-Samples/azure-search-openai-demo/blob/7ffcb3b95500dd741de2e6d1b70fa8073c9c3ce3/app/backend/prepdocslib/textsplitter.py#L96
Expected/desired behavior
A higher overlap percentage should show a higher number of chunked documents.
OS and Version?
WSL2 on Windows 11
Versions
Latest version of
main
.