Closed tonybaloney closed 3 months ago
@tonybaloney I've tried with a pdf containing 754 characters and it worked. Can you share the pdf document you have used?
Submitted a failing test to repro this in #81 and I'll add a patch as well. I don't think the tests are being run as part of the GitHub actions for PRs?
@tonybaloney I've figured out the problem: it happens for documents with less then 100 characters (due to the overlap parameter). I've reviewed your PR and merged.
Thanks! I couldn't figure out in the PR how to get the tests to run from the CI workflow. Please can you take a look. I can run them locally
When an indexed document has less than 1000 characters, the text splitter will not yield any pages and nothing is sent to the search service.
https://github.com/Azure-Samples/azure-search-openai-demo-java/blob/166ffda8f86e67830292724f4bf2322e26d9cb8f/app/indexer/core/src/main/java/com/microsoft/openai/samples/indexer/parser/TextSplitter.java#L59
This is the fix for the Python sample from which this function was based on https://github.com/Azure-Samples/azure-search-openai-demo/commit/e835da37aead8add52d210a7593663ce3c928229