Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.25k stars 4.59k forks source link

[BUG] Chat streaming implementation hangs on a 429 response, no async way to access response body #45618

Open KeithHenry opened 3 weeks ago

KeithHenry commented 3 weeks ago

Library name and version

Azure.AI.OpenAI 2.0.0-beta.2 and 1.0.0-beta.16

Describe the bug

In 2.0.0 OpenAI.Chat.ChatClient.CompleteChatStreamingAsync hangs forever when the service returns 429 too many requests. If you await it the code never resumes.

In 1.0.0 Azure.AI.OpenAI.OpenAIClient.GetChatCompletionsStreamingAsync hangs in the same way, but I'll stick to v2.0.0 for the examples below.

This is a very common occurrence. Any system using this API needs to handle rate limits.

The workaround for it is to inject a custom pipeline transport:

// string key, endpoint, deployment from app config

// this allows ChatClient to be directly injected or used with [FromServices]
return services.AddScoped<ChatClient>(s => { 
    var httpClient = s.GetRequiredService<IHttpClientFactory>().CreateClient("OpenAI");
    var errorTransport = new PromoteHttpStatusErrorsPipelineTransport(httpClient);
    var aiClient = new AzureOpenAIClient(uri, credential, new(){ Transport = errorTransport });
    return aiClient.GetChatClient(deployment);
});

And in that custom pipeline override OnReceivedResponse to handle the 429:

class PromoteHttpStatusErrorsPipelineTransport(HttpClient client): 
    HttpClientPipelineTransport(client)
{
    protected override void OnReceivedResponse(PipelineMessage message, HttpResponseMessage httpResponse)
    {
        if (!httpResponse.IsSuccessStatusCode) 
           // handle 429 and other errors
        base.OnReceivedResponse(message, httpResponse);
    }
}

When !httpResponse.IsSuccessStatusCode the detail is in httpResponse.Content, but every way of getting it is async (and OnReceivedResponse is not). This would mean using a blocking .Result, but that cascades the failure - busy server starts getting lots of 429 means lots of blocked threads means whole service hangs.

Expected behavior

CompleteChatStreamingAsync should not hang. When IsSuccessStatusCode == false that should be handled and the response read to provide error context. When there is not a response being streamed the results should not stall an await.

This is how the implementation in the chat playground in Azure AI Studio behaves:

error message including context from response

This could be an exception thrown, or return a combined result of the streaming collection and the context when not streaming.

If this can't be fixed in this library (say this is bug in the underlying implementation) then OnReceivedResponse needs to be async and called with await so that the response body can be read without blocking.

Actual behavior

await chatClient.CompleteChatStreamingAsync never resolves when the endpoint returns an error HTTP status code.

Reproduction Steps

var aiClient = new AzureOpenAIClient(uri, credential);
var chatClient = aiClient.GetChatClient(deployment);
var streamResponse = chatClient.CompleteChatStreamingAsync(promptsWithTooManyTokens);

await foreach (var res in streamResponse) {
    // This line will never be reached
}

// This line will never be reached, either

Environment

.NET SDK:
 Version:           8.0.302
 Commit:            ef14e02af8
 Workload version:  8.0.300-manifests.5273bb1c
 MSBuild version:   17.10.4+10fbfbf2e

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.22631
 OS Platform: Windows
 RID:         win-x64
 Base Path:   C:\Program Files\dotnet\sdk\8.0.302\

Tested in VS Code 1.92.2 and Visual Studio Professional 2022 17.10.3

github-actions[bot] commented 3 weeks ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jpalvarezl @ralph-msft @trrwilson.

tkumpumak commented 1 week ago

Looks like same thing happens if your input text hits content filter and the response is 400.

KeithHenry commented 1 week ago

Looks like same thing happens if your input text hits content filter and the response is 400.

@tkumpumak yeah - pretty much any response other than 2xx causes it to hang waiting on a stream that won't follow. Does my PromoteHttpStatusErrorsPipelineTransport hack workaround it for you?

tkumpumak commented 1 week ago

Looks like same thing happens if your input text hits content filter and the response is 400.

@tkumpumak yeah - pretty much any response other than 2xx causes it to hang waiting on a stream that won't follow. Does my PromoteHttpStatusErrorsPipelineTransport hack workaround it for you?

Yes your hack works fine, thanks for the fix! I wired cancellationtoken to my CompleteChatStreamingAsync call and it cancels via that.