Open annelo-msft opened 4 months ago
Thank you for your feedback. Tagging and routing to the team member best able to assist.
Sorry to bump this old issue - I've been using Azure OpenAI, and I notice that it uses SCM's ClientPiplelinePolicy
, but OpenAI's rate limiting does specify the retry after header. Any chance you know of an implementation of an SCM retry policy that does take this into account?
Hi @Pilchie - a fix for this issue was released in SCM 1.1.0-beta.6. If you reference this version, do you still see the issue?
Hi @annelo-msft - thanks for looking at this, and sorry for the delay. Today I tried out pinning SCM 1.1.0-beta7 and seeing what happened. Unfortunately, with the default retry policy, this doesn't actually help. Looking at the code for GetNextDelay
, I think the issue is that tryCount
is 0, so nextDelayMilliseconds
ends up being 1 day, and the retryAfter.TotalMilliseconds
is smaller than nextDelayMilliseconds
.
Nevermind - I am getting an x-ms-retry-after
of 1 day, so that seems ok.
I think there might still be an issue with message.RetryCount
being zero leading to a negative time to wait, which means there will be an immediate retry the first time, but that's not as big an issue to me.
Hi @Pilchie -- x-ms-retry-after
is an Azure-specific header, so not supported by default by SCM, because SCM provides general HTTP support, but not Azure-specific features. For Azure.AI.OpenAI, @trrwilson was adding capabilities into that client to support the Azure-specific features -- I don't know what the status of that is currently -- if that PR has been merged and what client version has it. @trrwilson could you share that info?
I think there might still be an issue with
message.RetryCount
being zero leading to a negative time to wait, which means there will be an immediate retry the first time, but that's not as big an issue to me.
This is interesting, and if there's a bug in SCM, I'd love to know about it so we can get it fixed! Would you be willing to share a quick repro case for this?
Interesting - I'm not sure how the delay got set to 1 day. Maybe both headers were sent. As for the repro, I was using the code below, with an Azure OpenAI endpoint set to the minimum rate limit:
using Azure.AI.OpenAI;
using Azure.Identity;
using OpenAI.Embeddings;
using System.ClientModel.Primitives;
const string endpointEnvironmentVariableName = "AZURE_OPENAI_ENDPOINT";
const string apiKeyEnvironmentVariableName = "AZURE_OPENAI_APIKEY";
var endpoint = Environment.GetEnvironmentVariable(endpointEnvironmentVariableName) ?? throw new Exception($"Set '{endpointEnvironmentVariableName}'.");
var apiKey = Environment.GetEnvironmentVariable(apiKeyEnvironmentVariableName) ?? throw new Exception($"Set '{apiKeyEnvironmentVariableName}'.");
var client = new AzureOpenAIClient(
new Uri(endpoint),
apiKey,
new AzureOpenAIClientOptions
{
RetryPolicy = new ClientRetryPolicy(),
});
var embeddingClient = client.GetEmbeddingClient("text-embedding-3-large");
const int count = 100;
var inputs = new List<string>(count);
for (int i = 0; i < count; i++)
{
inputs.Add($"This is a string to embed. It is number {i}.");
}
var result = embeddingClient.GenerateEmbeddings(inputs);
Console.WriteLine(result.GetRawResponse().Status);
Note that I was unable to get a breakpoint to hit inside the retry method, or to step into it, so it's possible something else was happening - I could only inspect the state once it was in the 1-day Wait
call.
Confirmed that RetryAfter
was the header that was sent (in addition to x-ratelimit-reset-tokens
and x-ratelimit-remaining-requests
)
Azure.Core's retry policy uses the default
DelayStrategy
implementation to check and wait to retry based on the response RetryAfter header. Since we don't haveDelayStrategy
in SCM, we did not implement this logic inClientRetryPolicy
. We should implement this, though, and may be able to address this issue as part of the addition of LRO abstractions to SCM in https://github.com/Azure/azure-sdk-for-net/issues/42114. We could have an internal type that implements exponential backoff and respects the RetryAfter header by default, and make it overridable if someone specifies a fixed polling interval. Subtypes of the retry policy or LRO implementations from TypeSpec could then override this behavior if needed for their service-specific implementations.