Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.19k stars 4.55k forks source link

[BUG] Azure.AI.OpenAI: Exceeding rate limits results in a retry which ends with 401 Unauthorized #44804

Open molinch opened 3 weeks ago

molinch commented 3 weeks ago

Library name and version

Azure.AI.OpenAI 2.0.0-beta.2

Describe the bug

We have a rather big prompt, and a small rate limit of 1000 tokens/minute. Due to that combination we can invoke the OpenAPI endpoint only once every minute.

If we exceed that we get back such an exception:

"message": "Service request failed.\nStatus: 401 (Unauthorized)\n",
"stackTrace": "   at Azure.AI.OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options)\n   at Azure.AI.OpenAI.Chat.AzureChatClient.CompleteChatAsync(BinaryContent content, RequestOptions options)\n   at OpenAI.Chat.ChatClient.<>c__DisplayClass8_0.<<CompleteChatStreamingAsync>g__getResultAsync|0>d.MoveNext()\n--- End of stack trace from previous location ---\n   at OpenAI.Chat.AsyncStreamingChatCompletionUpdateCollection.AsyncStreamingChatUpdateEnumerator.CreateEventEnumeratorAsync()\n   at OpenAI.Chat.AsyncStreamingChatCompletionUpdateCollection.AsyncStreamingChatUpdateEnumerator.System.Collections.Generic.IAsyncEnumerator<OpenAI.Chat.StreamingChatCompletionUpdate>.MoveNextAsync()\n   at Sofia.Common.DigitalAssistantModule.Clients.OpenAi.OpenAiClient.GetChatCompletionsStream(String text, Language language, OpenApiUseCaseOption options, CancellationToken cancellationToken, String callerMemberName)+MoveNext() in /src/Sofia.Common.DigitalAssistantModule/Clients/OpenAi/OpenAiClient.cs:line 125\n   at Sofia.Common.DigitalAssistantModule.Clients.OpenAi.OpenAiClient.GetChatCompletionsStream(String text, Language language, OpenApiUseCaseOption options, CancellationToken cancellationToken, String callerMemberName)+MoveNext() in /src/Sofia.Common.DigitalAssistantModule/Clients/OpenAi/OpenAiClient.cs:line 125\n   at Sofia.Common.DigitalAssistantModule.Clients.OpenAi.OpenAiClient.GetChatCompletionsStream(String text, Language language, OpenApiUseCaseOption options, CancellationToken cancellationToken, String callerMemberName)

image

We investigated deeper and the truth is that at first there is a 429 (due to rate limits), then I assume the client retries, which results in a 401, and then this exception.

It leads to very wrong investigations, as you think you have a 401, when it's actually a rate limiting issue. Now the big question would be why upon retrying it becomes unauthorized.

Expected behavior

It should fail with a message indicating that the rate limits are exceeded (the 429). Then surrounding code might decide to retry with delay.

Actual behavior

It fails with a 401 unauthorized, which is misleading and cannot be properly handled by surrounding code. Typically you wouldn't retry on a 401.

Reproduction Steps

Have a rate limit of 1000 tokens per minute. Then have a fairly big prompt that eats most of it, and call the streaming chat function. If you call it a second time, within the same minute, then the issue appears.

We reproduce this issue both with WorkloadIdentityCredential when running from Kubernetes, and when running locally with AzureCliCredential. So it doesn't seem related to any token credential issue.

Environment

No response

github-actions[bot] commented 3 weeks ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jpalvarezl @ralph-msft @trrwilson.

molinch commented 3 weeks ago

Some additional context, where you can see:

  1. Initial call works
  2. Second call fails due to rate limits, but it's instantly retried and we then get back a 401
  3. After having waited a bit, it is OK again
  4. Then same issue as in 2.

The payload is identical between all these calls.

image
trrwilson commented 2 weeks ago

Thanks, @molinch -- great observations and data. I believe this is a combination of poor default retry interval handling and likely misconfiguration of authentication information on retries; I'll follow up.

johnkord commented 5 days ago

I've noticed this too! Thanks for creating this bug, it makes sense. What did you look at to verify the underlying 429 status code in the responses?

molinch commented 5 days ago

@johnkord All HTTP requests were logged, so they were available in AppInsights logs