Azure / gpt-rag-ingestion

MIT License
61 stars 53 forks source link

Insiders #95

Closed gbecerra1982 closed 2 months ago

gbecerra1982 commented 2 months ago

This pull request includes fixes for the bugs inserted in the latest updates made to add support to optimized xlsx and transcription format chunking and introduction of ChunkerFactory approach.

PR includes changes to improve the handling of different file formats, optimize chunking processes, and enhance logging for better debugging. The most important changes include removing support for certain file formats, modifying chunking logic, and adding new environment variables.

File Format Handling:

Chunking Logic:

Environment Variables:

Logging and Error Handling:

Retry Mechanism:

vladborys commented 2 months ago

Approve assuming that removed file formats are still supported by document chunking function.