Azure-Samples / azure-search-openai-demo-java

This repo is the Java version of Microsoft's sample app for ChatGPT + Enterprise data.
MIT License
67 stars 66 forks source link

Text splitting doesn't work with tiny documents #78

Closed tonybaloney closed 3 months ago

tonybaloney commented 3 months ago

When an indexed document has less than 1000 characters, the text splitter will not yield any pages and nothing is sent to the search service.

This is the fix for the Python sample from which this function was based on

dantelmomsft commented 3 months ago

@tonybaloney I've tried with a pdf containing 754 characters and it worked. Can you share the pdf document you have used?

tonybaloney commented 3 months ago

Submitted a failing test to repro this in #81 and I'll add a patch as well. I don't think the tests are being run as part of the GitHub actions for PRs?

dantelmomsft commented 3 months ago

@tonybaloney I've figured out the problem: it happens for documents with less then 100 characters (due to the overlap parameter). I've reviewed your PR and merged.

tonybaloney commented 3 months ago

Thanks! I couldn't figure out in the PR how to get the tests to run from the CI workflow. Please can you take a look. I can run them locally