Describe the bug
Despite setting tokens_per_minute and max_retries in pipeline-settings.yaml, the system continues to make API calls every second, even though the LLM is rate-limited.
Output
The index logs in the Azure Blob Container with 429 error codes. This is repeated every second.
'type': 'error',
'data': 'Error Invoking LLM',
'cause': "Error code: 429 - {'error': {'code': '429', 'message': 'Rate limit is exceeded. Try again in 50 seconds.'}}",
Expected behavior
Upon 429 status code being returned from Azure OpenAI LLM endpoint, system should abide by the settings in pipeline-settings.yaml, minimising the number of API calls and 429 responses.
Resolved by redeploying the solution-accelerator to the resource group. Was unaware that redeployment was necessary to apply changes in pipeline-settings.yaml
Describe the bug Despite setting
tokens_per_minute
andmax_retries
inpipeline-settings.yaml
, the system continues to make API calls every second, even though the LLM is rate-limited.Output The index logs in the Azure Blob Container with
429
error codes. This is repeated every second.settings
pipeline-settings.yaml
as follows:Expected behavior Upon
429
status code being returned from Azure OpenAI LLM endpoint, system should abide by the settings inpipeline-settings.yaml
, minimising the number of API calls and429
responses.