Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.59k stars 3.75k forks source link

Erroneous overlap calculation #1667

Closed bastbu closed 4 weeks ago

bastbu commented 4 weeks ago

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

When modifying the DEFAULT_OVERLAP_PERCENT from 10 percent to 80 percent, the overlap calculation does not seem to work correctly. We can observe that when we increase the overlap, we get fewer indexed documents. The culprit is an erroneous calculation of percentages:

https://github.com/Azure-Samples/azure-search-openai-demo/blob/7ffcb3b95500dd741de2e6d1b70fa8073c9c3ce3/app/backend/prepdocslib/textsplitter.py#L96

Expected/desired behavior

A higher overlap percentage should show a higher number of chunked documents.

OS and Version?

WSL2 on Windows 11

Versions

Latest version of main.