Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.32k stars 1.97k forks source link

ChatCompletion call with image as data url not working with com.azure:azure-ai-openai:1.0.0-beta.9 #40900

Closed ashishs closed 3 months ago

ashishs commented 3 months ago

Describe the bug am trying to send an image for ChatCompletion using the Microsoft provided Azure OpenAi java sdk "com.azure:azure-ai-openai:1.0.0-beta.9". The java SDK does not seem to be serializing the request correctly. Why does the content gets serialized to a base64 string directly, instead of being a json array with user messages?

 val cco = ChatCompletionsOptions(
            mutableListOf(
            ChatRequestSystemMessage("You are an assistant which does Optical Character Recognition on scanned medical documents."),
            ChatRequestUserMessage(mutableListOf<ChatMessageContentItem>(
                ChatMessageTextContentItem("Read text from the scanned medical document. Only respond with the text content."),
                ChatMessageImageContentItem(ChatMessageImageUrl("data:${contentType};base64,${Base64.getEncoder().encodeToString(bytes)}"
            ))
        ))))

 {
    "messages": [
        {
            "content": "You are an assistant which does Optical Character Recognition on scanned medical documents.",
            "role": "system"
        },
        {
            "content": "W3sidGV4dCI6IlJlYW....REMOVED....==",
            "role": "user"
        }
    ],
    "max_tokens": 4096,
    "temperature": 0.2,
    "model": "gpt-4"
}      

Exception or Stack Trace Status code 400, "{

  "error": {
    "message": "This model's maximum context length is 128000 tokens. However, your messages resulted in 291246 tokens. Please reduce the length of the messages.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "context_length_exceeded"
  }
}"

To Reproduce Use the above provided kotlin/java code to send an image as part of a chat completion request.

Code Snippet

 val cco = ChatCompletionsOptions(
            mutableListOf(
            ChatRequestSystemMessage("You are an assistant which does Optical Character Recognition on scanned medical documents."),
            ChatRequestUserMessage(mutableListOf<ChatMessageContentItem>(
                ChatMessageTextContentItem("Read text from the scanned medical document. Only respond with the text content."),
                ChatMessageImageContentItem(ChatMessageImageUrl("data:${contentType};base64,${Base64.getEncoder().encodeToString(bytes)}"
            ))
        ))))

Expected behavior The chat completion call should work. Thsiis as per the code provided in the azure open-ai vision sample, https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai/src/samples/java/com/azure/ai/openai/usage/GetChatCompletionsVisionSample.java

The only difference is that the image is provided is a data url.

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

joshfree commented 3 months ago

@mssfang please take a look

mssfang commented 3 months ago

@ashishs Yeah. this is a recently reported bug. There is a WIP PR to fix it. https://github.com/Azure/azure-sdk-for-java/pull/40687

ashishs commented 3 months ago

@mssfang was this a change introduced in beta-9 release? Should we move back to beta-8?

ashishs commented 3 months ago

@mssfang was this a change introduced in beta-9 release? Should we move back to beta-8?

It seems that betas-8 has other issues with serialization of tool parameters :(

mssfang commented 3 months ago

@ashishs Do you have a public data URL that I can use to reproduce your failure.

Yeah. the existing issue found is the content should not be encoded. But the sample still works as it provided.