This pull request includes several changes aimed at fixing document chunking and improving logging.
Below is a summary of the most important changes:
Made some changes to fix document chunking funcion errors
Added new fields like category, length, and security_id to the chunk creation method in BaseChunker. (chunking/chunkers/base_chunker.py)
Added BlobStorageClient to BaseChunker and its subclasses for downloading blob data. (chunking/chunkers/base_chunker.py, chunking/chunkers/spreadsheet_chunker.py, chunking/chunkers/transcription_chunker.py) [1][2][3]
Changed get_chunker method in chunking/chunker_factory.py to be an instance method. (chunking/chunker_factory.py)
Improved Logging and Error Handling:
Enhanced error messages to include the filename and improved logging for exceptions. (chunking/document_chunking.py)
Updated log messages to include the filename for better traceability. (chunking/chunkers/doc_analysis_chunker.py, chunking/chunkers/langchain_chunker.py, chunking/chunkers/spreadsheet_chunker.py, chunking/chunkers/transcription_chunker.py, chunking/document_chunking.py, function_app.py) [1][2][3][4][5][6][7][8]
This pull request includes several changes aimed at fixing document chunking and improving logging.
Below is a summary of the most important changes:
Made some changes to fix document chunking funcion errors
category
,length
, andsecurity_id
to the chunk creation method inBaseChunker
. (chunking/chunkers/base_chunker.py
)BlobStorageClient
toBaseChunker
and its subclasses for downloading blob data. (chunking/chunkers/base_chunker.py
,chunking/chunkers/spreadsheet_chunker.py
,chunking/chunkers/transcription_chunker.py
) [1] [2] [3]get_chunker
method inchunking/chunker_factory.py
to be an instance method. (chunking/chunker_factory.py
)Improved Logging and Error Handling:
chunking/document_chunking.py
)chunking/chunkers/doc_analysis_chunker.py
,chunking/chunkers/langchain_chunker.py
,chunking/chunkers/spreadsheet_chunker.py
,chunking/chunkers/transcription_chunker.py
,chunking/document_chunking.py
,function_app.py
) [1] [2] [3] [4] [5] [6] [7] [8]