Closed devcxl closed 5 months ago
$ curl http://localhost:8787/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-123123123123123" -d '{
"stream":true,
"model": "@cf/qwen/qwen1.5-0.5b-chat",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello"
}
]
}'
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"!"},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" How"},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" may"},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" I"},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" assist"},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" you"},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" today"},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"?"},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":""},"index":0,"finish_reason":null}]}
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":""},"index":0,"finish_reason":"stop"}]}
I tested the return like this, and it conforms to the format of streaming return. However, it seems to return all the data at once.
I'll definitely take a look, this is a needed feature.
Look like you may be using the streaming code incorrectly. Here is some example streaming code from the official Workers docs.
export default {
async fetch(request, env, ctx) {
// Fetch from origin server.
let response = await fetch(request);
// Create an identity TransformStream (a.k.a. a pipe).
// The readable side will become our new response body.
let { readable, writable } = new TransformStream();
// Start pumping the body. NOTE: No await!
response.body.pipeTo(writable);
// ... and deliver our Response while that’s running.
return new Response(readable, response);
}
}
Here's a code fragment that ChatGPT suggested that could help.
if (json.stream) {
let {
readable,
writable
} = new TransformStream();
aiResp.body.pipeThrough(transformer).pipeTo(writable);
return new Response(readable, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
}
});
}
The answer from ChatGPT is not reliable. I tried to split the TransformStream into readable and writable parts, then returned the readable part, but the result still seems to return all the results at once.
const { readable,writable} = new TransformStream({
// omit some code
});
// for now, nothing else does anything. Load the ai model.
const aiResp = await ai.run(model, { stream: json.stream, messages });
// Piping the readableStream through the transformStream
aiResp.pipeTo(writable)
return json.stream ? new Response(readable, {
headers: {
'content-type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
},
})
Your above code seems to work on my side, how are you testing this? Here is now I tested the streaming.
curl -H 'Content-Type: application/json' -d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "Write me an essay on the formation of black holes"
}
],
"stream": true
}' http://localhost:8787/chat/completions
And it seemed to work fine. Another thing that should be updated is here. On my branch (linked below) I had to update the version to the latest version (2024-04-05
as of today) in order to get everything working properly.
I pushed the code to a separate branch for testing, however I intend to merge this PR and delete that branch once the code is working.
name = "openai-cf" # todo
main = "index.js"
compatibility_date = "2022-05-03"
compatibility_flags = [ "transformstream_enable_standard_constructor","streams_enable_constructors"]
I forgot to update the configuration in this branch as well.
"Upgrading compatibility_date without using compatibility_flags is also possible."
Thank you for your contribution!
Hello, I tried to add streaming return functionality, but it seems it doesn't truly stream the return. Can you help me take a look at this piece of code?