Open bruno-oliveira opened 5 months ago
Same issue here, even using the last available version (1.0.0-beta.9) inside a simple Java class.
Thanks for confirming this additionally as well! I initially thought it was an issue with the Spring AI project but after careful debugging realized that it was nested deeper into the SDK code, hopefully this can be picked up soon-ish.
PS: I did try to checkout the code myself and adapt the method from inside my fork of the Spring AI project but I got too lost in it and I couldn't devise a possible solution avenue
I have the same problem but I think is a matter of deploy configuration. Are you using personalized content filter without asynchronous streaming mode? In which region is your deployment?
@nazarenodefrancescomaize No, essentially I'm just using OpenAI API as it comes out of the box, nothing else is configured anywhere. Essentially, I use a test prompt for this:
Give me a recipe from Portugal
Then I need to see the answer being streamed if the mode is set to stream, and not streamed if not. I have no filters anywhere, region is within EU.
@nazarenodefrancescomaize No, essentially I'm just using OpenAI API as it comes out of the box, nothing else is configured anywhere. Essentially, I use a test prompt for this:
Give me a recipe from Portugal
Then I need to see the answer being streamed if the mode is set to stream, and not streamed if not. I have no filters anywhere, region is within EU.
Ok thanks. So I think it is a different problem, beacuase we solved our problem and was related to a custom content filter applied on that Azure deployment without the Asynchrounous filtering enabled. In default mode in fact the filter waits the generation completion, even in streaming mode.
I also encountered the same problem
bump
I can also confirm that issue in combination with springai. It is also not related to my GPT model / azure deployment. Using the npm package (@azure/openai) the streaming works without issues.
@bruno-oliveira for a possible solution see communication in https://github.com/spring-projects/spring-ai/pull/1054 At least for me that is working..
I can also confirm this bug. Previously (A couple of months) ago this worked. Here is my service method:
public Flux<ServerSentEvent<ConversationModelDto>> generateStream(String message) {
UserMessage userMessage = new UserMessage(message);
Prompt prompt = new Prompt(List.of(userMessage));
return chatClient.stream(prompt)
.map(chatResponse -> {
String resp = chatResponse.getResult().getOutput().getContent();
ConversationModelDto conversationModelDto = ConversationModelDto.builder()
.type("bot")
.message(resp)
.sessionToken("999")
.build();
return ServerSentEvent.builder(conversationModelDto).build();
});
}
I am mapping to a dto before using a ServerSentEvent to stream the data to an EventSource in an Angular app. Like I said, previously worked but now it seems the JSON returned is truncated somehow and Jackson can't cope.
Here is an example of the error message:
```
2024-07-31T11:06:19.776+02:00 ERROR 9356 --- [AiExampleProject] [oundedElastic-1] c.a.c.i.MethodHandleReflectiveInvoker : Unexpected end-of-input in VALUE_STRING at [Source: (byte[])"{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"sa"; line: 1, column: 174] 2024-07-31T11:06:19.777+02:00 ERROR 9356 --- [AiExampleProject] [oundedElastic-1] c.a.c.i.s.DefaultJsonSerializer : com.azure.json.implementation.jackson.core.io.JsonEOFException: Unexpected end-of-input in VALUE_STRING at [Source: (byte[])"{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"sa"; line: 1, column: 174]
You can see that the JSON has been truncated. Whats interesting is that untruncated JSON continues to be returned and the truncated is skipped but this results in the output being a little nonsensical.
Here is some test output from post man, for reference the prompt said tell me a bedtime story. Oldest response is at the bottom. When this works I see "once upon a time" but here you can see the words "you" and "once upon a" are missing.
{"message":" time","type":"bot","sessiontoken":"999"} 09:59:25 {"message":" for","type":"bot","sessiontoken":"999"} 09:59:25 {"message":" bedtime","type":"bot","sessiontoken":"999"} 09:59:25 {"message":" cozy","type":"bot","sessiontoken":"999"} 09:59:25 {"message":" a","type":"bot","sessiontoken":"999"} 09:59:25 {"message":"Certainly","type":"bot","sessiontoken":"999"} 09:59:25 {"type":"bot","sessiontoken":"999"}
Describe the bug When streaming a completion, the results are still all aggregated and arrive sequentially instead of "streamed".
Exception or Stack Trace No stacktrace
To Reproduce Make a streaming call with Spring AI version 1.0.0 and observe the nature of the call is wrong, as the chunks are "blocked" somewhere deep in the Azure SDK code.
Code Snippet
Expected behavior The response needs to be streamed instead of arriving all at once
Setup (please complete the following information):
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report