Azure-Samples / azure-search-openai-javascript

A TypeScript sample app for the Retrieval Augmented Generation pattern running on Azure, using Azure AI Search for retrieval and Azure OpenAI and LangChain large language models (LLMs) to power ChatGPT-style and Q&A experiences.
MIT License
237 stars 121 forks source link

Indexer will not index tiny documents #193

Closed tonybaloney closed 6 months ago

tonybaloney commented 6 months ago

I'm reporting this issue because I've seen the same bug in the .NET version of this sample and the logic is the same. You could reproduce this with a simple test to call DocumentProcessor.createDocumentFromFile with a file containing fewer characters than MAX_SECTION_LENGTH:

https://github.com/Azure-Samples/azure-search-openai-javascript/blob/main/packages/indexer/src/lib/document-processor.ts#L16-L21

Split pages is missing a conditional return statement that yields a single page when the length is below the MAX_SECTION_LENGTH https://github.com/Azure-Samples/azure-search-openai-javascript/blob/main/packages/indexer/src/lib/document-processor.ts#L72-L78

See this change for the Python patch: https://github.com/Azure-Samples/azure-search-openai-demo/commit/e835da37aead8add52d210a7593663ce3c928229

sinedied commented 6 months ago

Thanks for the detailed report and explanation! I'll fix it ASAP

tonybaloney commented 6 months ago

Here's the patch for Java, it should be clear what to change https://github.com/Azure-Samples/azure-search-openai-demo-java/pull/81