[QUESTION/BUG] Repeatedly hitting 429s despite setting TPM

Describe the bug Despite setting tokens_per_minute and max_retries in pipeline-settings.yaml, the system continues to make API calls every second, even though the LLM is rate-limited.

Output The index logs in the Azure Blob Container with 429 error codes. This is repeated every second.

      'type': 'error',
      'data': 'Error Invoking LLM',
      'cause': "Error code: 429 - {'error': {'code': '429', 'message': 'Rate limit is exceeded. Try again in 50 seconds.'}}",

settings pipeline-settings.yaml as follows:

llm:
  type: azure_openai_chat
  api_base: $GRAPHRAG_API_BASE
  api_version: $GRAPHRAG_API_VERSION
  model: $GRAPHRAG_LLM_MODEL
  deployment_name: $GRAPHRAG_LLM_DEPLOYMENT_NAME
  cognitive_services_endpoint: $GRAPHRAG_COGNITIVE_SERVICES_ENDPOINT
  model_supports_json: True
  tokens_per_minute: 80000
  requests_per_minute: 480
  thread_count: 50
  concurrent_requests: 1
  max_retries: 10
  max_retry_wait: 60.0

Expected behavior Upon 429 status code being returned from Azure OpenAI LLM endpoint, system should abide by the settings in pipeline-settings.yaml, minimising the number of API calls and 429 responses.

Azure-Samples / graphrag-accelerator

[QUESTION/BUG] Repeatedly hitting 429s despite setting TPM #129