This pull request includes several updates to the SpreadsheetChunker class and the AzureOpenAIClient class to improve performance and readability. The key changes involve adding timing logs, refactoring the chunking logic, and updating the Azure OpenAI client configuration. Additionally, a fix was implemented for an error in the get_embedding method.
Enhancements to SpreadsheetChunker:
Added timing logs to track the duration of chunk processing and overall execution time in get_chunks and _spreadsheet_process methods. [1][2]
Refactored the chunking logic to streamline the process of converting sheets to HTML or Markdown and generating summaries. [1][2]
Configuration updates in AzureOpenAIClient:
Moved the maximum retries and token limits into instance variables within the __init__ method.
Updated the get_completion and get_embeddings methods to use instance variables for model deployment names and token limits. [1][2][3]
Bug Fix:
Fixed the get_embedding method to handle an error related to unsupported models. The error message was: Error code: 400 - {'error': {'code': 'OperationNotSupported', 'message': 'The embeddings operation does not work with the specified model, gpt-4o. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.'}}. The method now uses a compatible model for the embedding operation.
Minor changes:
Imported the time module in spreadsheet_chunker.py to support the new timing logs.
Cleaned up the SpreadsheetChunker class docstring for better readability.
This pull request includes several updates to the
SpreadsheetChunker
class and theAzureOpenAIClient
class to improve performance and readability. The key changes involve adding timing logs, refactoring the chunking logic, and updating the Azure OpenAI client configuration. Additionally, a fix was implemented for an error in theget_embedding
method.Enhancements to
SpreadsheetChunker
:get_chunks
and_spreadsheet_process
methods. [1] [2]Configuration updates in
AzureOpenAIClient
:__init__
method.get_completion
andget_embeddings
methods to use instance variables for model deployment names and token limits. [1] [2] [3]Bug Fix:
get_embedding
method to handle an error related to unsupported models. The error message was:Error code: 400 - {'error': {'code': 'OperationNotSupported', 'message': 'The embeddings operation does not work with the specified model, gpt-4o. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.'}}
. The method now uses a compatible model for the embedding operation.Minor changes:
time
module inspreadsheet_chunker.py
to support the new timing logs.SpreadsheetChunker
class docstring for better readability.