Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.35k stars 1.99k forks source link

[BUG] Streaming does not work with Spring AI and Azure OpenAI #40629

Open bruno-oliveira opened 5 months ago

bruno-oliveira commented 5 months ago

Describe the bug When streaming a completion, the results are still all aggregated and arrive sequentially instead of "streamed".

Exception or Stack Trace No stacktrace

To Reproduce Make a streaming call with Spring AI version 1.0.0 and observe the nature of the call is wrong, as the chunks are "blocked" somewhere deep in the Azure SDK code.

Code Snippet

 @ServiceMethod(returns = ReturnType.COLLECTION)
    public IterableStream<ChatCompletions> getChatCompletionsStream(String deploymentOrModelName,
        ChatCompletionsOptions chatCompletionsOptions) {
        chatCompletionsOptions.setStream(true);
        RequestOptions requestOptions = new RequestOptions();
        Flux<ByteBuffer> responseStream = getChatCompletionsWithResponse(deploymentOrModelName,
            BinaryData.fromObject(chatCompletionsOptions), requestOptions).getValue().toFluxByteBuffer();
        OpenAIServerSentEvents<ChatCompletions> chatCompletionsStream
            = new OpenAIServerSentEvents<>(responseStream, ChatCompletions.class);
        return new IterableStream<>(chatCompletionsStream.getEvents());
    }
Class: OpenAIClient.java

Expected behavior The response needs to be streamed instead of arriving all at once

Setup (please complete the following information):

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

ventaglio commented 4 months ago

Same issue here, even using the last available version (1.0.0-beta.9) inside a simple Java class.

bruno-oliveira commented 4 months ago

Thanks for confirming this additionally as well! I initially thought it was an issue with the Spring AI project but after careful debugging realized that it was nested deeper into the SDK code, hopefully this can be picked up soon-ish.

PS: I did try to checkout the code myself and adapt the method from inside my fork of the Spring AI project but I got too lost in it and I couldn't devise a possible solution avenue

nazarenodefrancescomaize commented 4 months ago

I have the same problem but I think is a matter of deploy configuration. Are you using personalized content filter without asynchronous streaming mode? In which region is your deployment?

bruno-oliveira commented 4 months ago

@nazarenodefrancescomaize No, essentially I'm just using OpenAI API as it comes out of the box, nothing else is configured anywhere. Essentially, I use a test prompt for this:

Give me a recipe from Portugal

Then I need to see the answer being streamed if the mode is set to stream, and not streamed if not. I have no filters anywhere, region is within EU.

nazarenodefrancescomaize commented 4 months ago

@nazarenodefrancescomaize No, essentially I'm just using OpenAI API as it comes out of the box, nothing else is configured anywhere. Essentially, I use a test prompt for this:

Give me a recipe from Portugal

Then I need to see the answer being streamed if the mode is set to stream, and not streamed if not. I have no filters anywhere, region is within EU.

Ok thanks. So I think it is a different problem, beacuase we solved our problem and was related to a custom content filter applied on that Azure deployment without the Asynchrounous filtering enabled. In default mode in fact the filter waits the generation completion, even in streaming mode.

ifangng commented 4 months ago

I also encountered the same problem

bruno-oliveira commented 4 months ago

bump

timostark commented 4 months ago

I can also confirm that issue in combination with springai. It is also not related to my GPT model / azure deployment. Using the npm package (@azure/openai) the streaming works without issues.

timostark commented 3 months ago

@bruno-oliveira for a possible solution see communication in https://github.com/spring-projects/spring-ai/pull/1054 At least for me that is working..

Ben-Rowley-1980 commented 3 months ago

I can also confirm this bug. Previously (A couple of months) ago this worked. Here is my service method:

public Flux<ServerSentEvent<ConversationModelDto>> generateStream(String message) {
        UserMessage userMessage = new UserMessage(message);
        Prompt prompt = new Prompt(List.of(userMessage));
        return chatClient.stream(prompt)
                .map(chatResponse -> {
                    String resp = chatResponse.getResult().getOutput().getContent();
                    ConversationModelDto conversationModelDto = ConversationModelDto.builder()
                            .type("bot")
                            .message(resp)
                            .sessionToken("999")
                            .build();
                    return ServerSentEvent.builder(conversationModelDto).build();
                });
    }
I am mapping to a dto before using a ServerSentEvent to stream the data to an EventSource in an Angular app. Like I said, previously worked but now it seems the JSON returned is truncated somehow and Jackson can't cope.

Here is an example of the error message:

```

2024-07-31T11:06:19.776+02:00 ERROR 9356 --- [AiExampleProject] [oundedElastic-1] c.a.c.i.MethodHandleReflectiveInvoker : Unexpected end-of-input in VALUE_STRING at [Source: (byte[])"{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"sa"; line: 1, column: 174] 2024-07-31T11:06:19.777+02:00 ERROR 9356 --- [AiExampleProject] [oundedElastic-1] c.a.c.i.s.DefaultJsonSerializer : com.azure.json.implementation.jackson.core.io.JsonEOFException: Unexpected end-of-input in VALUE_STRING at [Source: (byte[])"{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"sa"; line: 1, column: 174]


You can see that the JSON has been truncated. Whats interesting is that untruncated JSON continues to be returned and the truncated is skipped but this results in the output being a little nonsensical.

Here is some test output from post man, for reference the prompt said tell me a bedtime story. Oldest response is at the bottom. When this works I see "once upon a time" but here you can see the words "you" and "once upon a" are missing.

{"message":" time","type":"bot","sessiontoken":"999"} 09:59:25 {"message":" for","type":"bot","sessiontoken":"999"} 09:59:25 {"message":" bedtime","type":"bot","sessiontoken":"999"} 09:59:25 {"message":" cozy","type":"bot","sessiontoken":"999"} 09:59:25 {"message":" a","type":"bot","sessiontoken":"999"} 09:59:25 {"message":"Certainly","type":"bot","sessiontoken":"999"} 09:59:25 {"type":"bot","sessiontoken":"999"}