Closed gbecerra1982 closed 2 months ago
Refactored old chunkers into multiple chunks to be more modular and extensible, so changes in chunk_documents_docint.py, chunk_documents_raw.py, chunk_metadata_helper.py, and text_embedder.py wont be effective.
This pull request focuses on refactoring the
chunk_documents_docint.py
,chunk_documents_raw.py
,chunk_metadata_helper.py
, andtext_embedder.py
files to use asynchronous operations. The most important changes include converting synchronous functions to asynchronous ones, updating Azure SDK imports to their asynchronous counterparts, and ensuring proper session and resource management.Asynchronous Refactoring:
chunker/chunk_documents_docint.py
: Converted functions such asget_secret
,analyze_document_rest
,get_chunk
, andchunk_document
to asynchronous versions. Updated Azure SDK imports to use asynchronous modules and replaced synchronous HTTP requests withaiohttp
requests. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]chunker/chunk_documents_raw.py
: Updated thechunk_document
function to be asynchronous and modified thegenerate_chunks_with_embedding
call to useawait
. [1] [2] [3] [4]chunker/chunk_metadata_helper.py
: Updated theChunkEmbeddingHelper
class to include asynchronous methods and added a class method for asynchronous initialization.embedder/text_embedder.py
: Refactored theTextEmbedder
class to support asynchronous operations, including asynchronous initialization and embedding methods. [1] [2] [3]