Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.46k stars 4.8k forks source link

[BUG] Azure OpenAI truncating long responses #39183

Open JohnGalt1717 opened 1 year ago

JohnGalt1717 commented 1 year ago

Library name and version

Azure.AI.OpenAI 1.0.0-beta.8

Describe the bug

(this may actually be a .net framework issue with BinaryData)

If you have a response that is greater than 3900 characters the Content value is automatically truncated with a ...

If you read the RawResponse and get to the Content property which is binary data, and do ToString() you'll see the same. If you right click on the debugger variable in VS Code and choose "Copy Value" it doesn't truncate it.

Every other attempt to get the full value out of BinaryData results in a truncated response, even though it definately is there because vs code can pull it.

Expected behavior

Should return the entire content properly without truncation.

Actual behavior

Truncated response.

Reproduction Steps

  1. Create an Azure Open AI, get the endpoint and key, go into the OpenAI Studio and create a ChatGPT 4 deployment.

  2. Create a OpenAiClient using the Azure.AI.OpenAI library above.

  3. Create options like this:

    var options = new ChatCompletionsOptions
        {
            User = _user.Id.ToString(),
            MaxTokens = 28000,
            ChoiceCount = 1,
            Temperature = 0.1F,
            NucleusSamplingFactor = (float)0.95,
            FrequencyPenalty = 0,
            PresencePenalty = 0,
        };
    
        ...Add some message that results in a very long response from GPT4.
  4. Submit the request like this:

var response = await _client.GetChatCompletionsAsync(_settings.LanguageDeployment, options, cancellationToken); var sbContent = new StringBuilder(); foreach (var choice in response.Value.Choices) { sbContent.Append(choice.Message.Content);

        if (choice.Message.AzureExtensionsContext?.Messages is not null)
        {
            foreach (var contextMessage in choice.Message.AzureExtensionsContext.Messages)
            {
                Console.WriteLine(contextMessage);
            }
        }
    }

choices.Message.Content will be reliably cut off.

Environment

Windows 11


dotnet --info     
.NET SDK:
 Version:   8.0.100-rc.1.23463.5
 Commit:    e7f4de8816

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.23560
 OS Platform: Windows
 RID:         win-x64
 Base Path:   C:\Program Files\dotnet\sdk\8.0.100-rc.1.23463.5\

.NET workloads installed:
There are no installed workloads to display.

Host:
  Version:      8.0.0-rc.1.23419.4
  Architecture: x64
  Commit:       92959931a3
  RID:          win-x64

.NET SDKs installed:
  6.0.415 [C:\Program Files\dotnet\sdk]
  7.0.401 [C:\Program Files\dotnet\sdk]
  8.0.100-rc.1.23463.5 [C:\Program Files\dotnet\sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 6.0.22 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 6.0.23 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.0-rc.1.23421.29 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 6.0.22 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 6.0.23 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.0-rc.1.23419.4 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.WindowsDesktop.App 6.0.22 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 6.0.23 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 7.0.11 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 8.0.0-rc.1.23420.5 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

Other architectures found:
  x86   [C:\Program Files (x86)\dotnet]
    registered at [HKLM\SOFTWARE\dotnet\Setup\InstalledVersions\x86\InstallLocation]

Visual Studio Code September 2023.

pallavit commented 1 year ago

Thank you for reporting the issue. Assigning to the person best able to assist here. /cc: @joseharriaga

JohnGalt1717 commented 1 year ago

Note that I also tried a 3rd party ChatGPT library and it exhibits the same result, so this may be upstream, but the odd part is the entire value is there in the BinaryData in vs code. The content length however appears to be wrong which may be why it's failing.

joseharriaga commented 1 year ago

Hello, @JohnGalt1717. I'm trying to repro the issue, and I have a few questions:

  1. In your code sample, you're using MaxTokens = 28000. Just to confirm: This means that you're using the "gpt-4-32k" model, correct? (since the "gpt-4" model only allows for up to 8,192 tokens)

  2. In your code sample, you have the following:

    if (choice.Message.AzureExtensionsContext?.Messages is not null)
    {
    foreach (var contextMessage in choice.Message.AzureExtensionsContext.Messages)
    {
        Console.WriteLine(contextMessage);
    }
    }

    Are you using the AzureExtensionsContext for the repro? Or can this be ignored?

  3. When you say: "choices.Message.Content will be reliably cut off.", do you mean that it'll be truncated if you do the following?

    Console.WriteLine(choices.Message.Content);

    Or where are you seeing it get truncated?

JohnGalt1717 commented 1 year ago
  1. Yes. 32k
  2. No. That's there because in another place (not this code execution) we do. You can ignore it. It never gets called and it isn't in the options.
  3. Yes. It always ends in ... instead of the complete response like Playground does. And if you inspect the BinaryContent per above in vscode it's all there too, just not in the deserizalized result.
joseharriaga commented 1 year ago

When you say: "a response that is greater than 3900 characters", do you mean the entire response? Or are you referring specifically to the length of the choices.Message.Content string?

So far, I can see the string get truncated in the VS Code UI, but not when I write it to the console using Console.WriteLine. Could you provide a code sample using Console.WriteLine(choices.Message.Content); to ensure that I'm doing the same thing as you? If you could add a picture of where and how you're seeing the truncated text, that could help too.

Also, are you using the C# extension for VS Code? If so, what version? For the record, there's currently a bug in the extension starting with version 2.4.2 related to string truncation, although the scenario sounds a little different from what you describe: 🔗 https://github.com/dotnet/vscode-csharp/issues/6496

JohnGalt1717 commented 1 year ago

It appears to be the entire response. It decodes without error but just ends. And RawResponse.Content.ToString() or even using ToMemory() is a truncated JSON response that doesn't actually end and is in the Content of the Choices[0].Message

Console.WriteLine(choices.Message.Content) ends with "..."

VS code isn't the problem. BinaryContent in RawDataResponse.Content returns exactly the same truncated text that is truncated after the JsonDecode. I can reproduce this by manually using an httpclient and decode the response using HttpContent.FromJsonResponseAsync(). It has the same incomplete result that the Azure SDK does.

Basically if you use httpclient or the Azure.AI.OpenAIClient, and the choices[0].Message.Content is > 3900 characters it truncates at 3901 total characters for the entire response and thus has incomplete json that doesn't error, because the Azure library is stream decoding the json. If you just get the RawResponse.Content and just .ToString it you get the truncated json response that won't decode properly.

It appears that the Playground is using built in support for text/event-stream to receive the results differently and thus it works properly there, but doesn't work in C#. I've tried to use text/event-stream using a manual httpclient to work around the issue, but it stops sending the event-stream at exactly the same place.

(and in all of these cases the Content-Length header for the response is 3901, when the real response is over 5000 characters)

I can't give you exact examples because it's trade secret info so I'd have to create a bogus sample to try and induce gpt4-32k to give me a long response. I tried this but it appears that it won't give you a long response without REALLY REALLY good justification like our super prompt, so I have nothing that will generate 5000 characters of jibberish or even non-jibberish to create a sample for you.

trrwilson commented 1 year ago

I'm talking with @joseharriaga about this issue and also trying to reproduce it; unfortunately, I'm also not having any problems getting gpt-4-32k to provide responses much larger than 4K characters.

@JohnGalt1717, could you please share the value of GetRawResponse().ClientRequestId from a repro? That should let me query the operation metadata to see if anything stands out.

Also:

image

The non-streaming GetChatCompletionsAsync method receives its response as an application/JSON document with the content field just another bit of JSON, so an overall response truncation within the text would prevent deserialization from working correctly -- whatever's happening would need to be related to the model generation process itself and may involve the interaction of some deployment details. That'd be consistent with streaming seemingly getting cut off the same way, too.

JohnGalt1717 commented 1 year ago

ClientRequestId: "3fdc83c2-6f69-4afb-b46c-93487992b164" FinishReason: {stop} image

I'd agree with your last statement however, when I look at the GetRawResponse.Content in VSCode and copy the value, it's intact and complete. It's only when you try and access it through the decoded response (or GetRawResponse.Content.ToString or similar) that it doesn't return the full result.

image

If I right click on and copy value on that Content [BinaryData] line it's complete. If I go to the debug console and dump response.Value.Choices.First().Message.Content it isn't complete and truncated with ... on the end.

Doing this at the console: response.GetRawResponse().Content dumps the entire content and it's intact. (I have no idea why .Content shows the whole thing, but .Content.ToString() doesn't, I would have thought that the former called .ToString() to dump it to the debug window. I could work around this if I could get response.GetRawResponse().Content dumped into a string variable the same as it dumps to the console...)

But response.GetRawResponse().Content.ToString() results in the truncated version being dumped ending exactly where the response.Value.Choices[0].Message.Content does just including the rest of the response before it. (which makes no sense but that's what it does.)

PS: I dumped it in logs and it's also incomplete, so I know that it isn't VS code messing with it.

PPS: To be more clear: (from the debug console)

response.GetRawResponse().Content = full response.GetRawResponse().Content.ToString() = truncated (but includes the entire response but truncates at exactly the same spot as below response.Value.Choices[0].Message.Content = truncated everywhere (console, assigning to another string etc.)

In Code:

var content = response.GetRawResponse().Content.ToString() = Truncated content var content = UTF8Encoding.UTF8.GetString(response.GetRawResponse().Content.ToArray()) = Truncated same way var content = UTF8Encoding.UTF8.GetString(response.GetRawResponse().Content.ToMemory().Span) = Truncated same way

In Code using HTTPClient and REST:

response.Content.ReadAsStringAsync() = Truncated. response.Content.ReadAsStream() + StreamReader.ReadToEnd() = truncated. response.Content.ReadAsStream() + StreamReader with while.NotEof = truncated. response.Content.ReadAsJsonAsync<>() = truncated.

And they're always truncated at exactly the same spot which makes the entire response.Value not valid JSON which is why I have to assume that it's using something like ReadAsJsonAsync<> to get the response.

trrwilson commented 1 year ago

Thanks, @JohnGalt1717 -- I was able to pull the request metadata from your client ID and it looks normal in a way that's consistent with what you're describing; the ResponseLength is recorded as 3595 bytes against a RequestLength of 2198 bytes and there are no error codes or anything else out of the ordinary.

If the response appears intact and complete via any mechanism (e.g. the inspection of BinaryData) then we can likely conclude that the response payload you're receiving is complete.

Totally understood you can't share the verbatim response data. My apologies that this turns into a bit of a game of "hot or cold" in understanding what's going on.

To get string behavior out of the equation, could you please try dumping the bytes to a file and checking if that looks correct the same way inspecting BinaryData in VSCode does? E.g.:

Response<ChatCompletions> response = await client.GetChatCompletionsAsync("gpt-4-32k", chatCompletionsOptions);

using (var outStream = File.Create("out.txt"))
{
    await response.GetRawResponse().ContentStream.CopyToAsync(outStream);
}

This should write the raw contents of the response stream as an unformatted JSON string. Does the response document itself look correct there? If it does, is there anything unusual that stands out in the content string at the position where truncation is happening? E.g. a control character like a null marker, something strangely encoded, anything like that.

I'm working under the assumption that there's a property-level deserialization problem happening. If there's any manner in which the data looks correct, we can rule out the payload being bad; and if the behavior weren't just in the deserialization of that one specific property, I'd expect the entire JsonDocument.Parse call to fail with a malformed input.

JohnGalt1717 commented 1 year ago

I'm at a conference until Friday. I'll be on this ASAP Friday morning. Thanks. If there's an email address I can also send the results. Just don't want it in the wild on Github.

JohnGalt1717 commented 1 year ago

It writes the entire thing! Whoohoo!

Now I just need to figure out how to take that raw data dumped into a memorystream and get it to deserialize to json and I can work around the problem (presumably)

Note that the response from the bot is json itself, so perhaps it isn't properly escaping the response which is causing the parsing issues?

JohnGalt1717 commented 1 year ago

Ok, I think I found the problem. The response has a ton of 0x0A characters in it which causes the json deserialization to freak out because they're not escaped properly in what is being returned from Azure OpenAI.

I manually removed them from the stream and then it will deserialize it properly.

trrwilson commented 1 year ago

Thanks, @JohnGalt1717 ! That's great that you got it working! Now that you've found the encoded newline characters are the culprit, is there a minimal/sanitized version of the original output you should share here? I'd love to get this fixed so you don't need to manipulate the response content stream this way; it seems like escaping isn't behaving correctly somewhere (potentially on the service side) and that should be addressed. I'm experimenting a bit with manually adding 0x0A (\n) into artificial payloads, but that seems to fail completely when parsing the document.

JohnGalt1717 commented 1 year ago

I believe if you ask for json back that has line breaks in the json AND one or more string fields in the json you'll get the issue.

MattCarneiro commented 1 year ago

I have the same problem with chatgpt 3.5 version with 8k max tokens. It doesn't need to be long responses. It's truncating at 200 tokens as well

Sandesh-seezo commented 10 months ago

I have the same issue that Matt mentioned. Truncating responses to ~200 words. No error messsage

thammermann commented 5 months ago

JohnGalt1717 @JohnGalt1717 how did you solve the issue?:)

JohnGalt1717 commented 5 months ago

It went away when I upgraded the version that I was using of the API.