This pull request focuses on refactoring the codebase to use asynchronous operations, which should improve the performance and responsiveness of the application. The most important changes include converting synchronous Azure SDK calls to their asynchronous counterparts, updating the document analysis and chunking functions to be asynchronous, and refactoring the TextEmbedder and ChunkEmbeddingHelper classes to support asynchronous operations.
Asynchronous Azure SDK Calls:
chunker/chunk_documents_docint.py: Replaced synchronous imports with asynchronous ones (DefaultAzureCredential, BlobServiceClient, SecretClient) and added aiohttp and asyncio imports.
embedder/text_embedder.py: Replaced synchronous SecretClient and DefaultAzureCredential with their asynchronous versions and updated the TextEmbedder class to use AsyncAzureOpenAI.
chunker/chunk_metadata_helper.py: Refactored ChunkEmbeddingHelper to support asynchronous operations, including an asynchronous create method and generate_chunks_with_embedding method.
embedder/text_embedder.py: Refactored TextEmbedder to include asynchronous methods for embedding content and retrieving secrets. [1][2]
These changes aim to improve the efficiency of document processing by leveraging asynchronous I/O operations
This pull request focuses on refactoring the codebase to use asynchronous operations, which should improve the performance and responsiveness of the application. The most important changes include converting synchronous Azure SDK calls to their asynchronous counterparts, updating the document analysis and chunking functions to be asynchronous, and refactoring the
TextEmbedder
andChunkEmbeddingHelper
classes to support asynchronous operations.Asynchronous Azure SDK Calls:
chunker/chunk_documents_docint.py
: Replaced synchronous imports with asynchronous ones (DefaultAzureCredential
,BlobServiceClient
,SecretClient
) and addedaiohttp
andasyncio
imports.embedder/text_embedder.py
: Replaced synchronousSecretClient
andDefaultAzureCredential
with their asynchronous versions and updated theTextEmbedder
class to useAsyncAzureOpenAI
.Asynchronous Document Analysis:
chunker/chunk_documents_docint.py
: Convertedget_secret
,analyze_document_rest
,get_chunk
, andchunk_document
functions to be asynchronous. [1] [2] [3] [4]Asynchronous Chunking:
chunker/chunk_documents_raw.py
: Updatedchunk_document
function to be asynchronous and to use asynchronousChunkEmbeddingHelper
. [1] [2]Refactoring Helper Classes:
chunker/chunk_metadata_helper.py
: RefactoredChunkEmbeddingHelper
to support asynchronous operations, including an asynchronouscreate
method andgenerate_chunks_with_embedding
method.embedder/text_embedder.py
: RefactoredTextEmbedder
to include asynchronous methods for embedding content and retrieving secrets. [1] [2]These changes aim to improve the efficiency of document processing by leveraging asynchronous I/O operations