chand1012 / openai-cf-workers-ai

Replacing OpenAI's API with Cloudflare AI.
MIT License
224 stars 45 forks source link

Feature: Chat stream response #5

Closed devcxl closed 5 months ago

devcxl commented 5 months ago

Hello, I tried to add streaming return functionality, but it seems it doesn't truly stream the return. Can you help me take a look at this piece of code?

devcxl commented 5 months ago
$ curl http://localhost:8787/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer sk-123123123123123"   -d '{
    "stream":true,
    "model": "@cf/qwen/qwen1.5-0.5b-chat",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello"
      }
    ]
  }'
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"!"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" How"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" may"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" I"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" assist"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" you"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" today"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"?"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":""},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":""},"index":0,"finish_reason":"stop"}]}

I tested the return like this, and it conforms to the format of streaming return. However, it seems to return all the data at once.

chand1012 commented 5 months ago

I'll definitely take a look, this is a needed feature.

chand1012 commented 5 months ago

Look like you may be using the streaming code incorrectly. Here is some example streaming code from the official Workers docs.

export default {
  async fetch(request, env, ctx) {
    // Fetch from origin server.
    let response = await fetch(request);

    // Create an identity TransformStream (a.k.a. a pipe).
    // The readable side will become our new response body.
    let { readable, writable } = new TransformStream();

    // Start pumping the body. NOTE: No await!
    response.body.pipeTo(writable);

    // ... and deliver our Response while that’s running.
    return new Response(readable, response);
  }
}

Here's a code fragment that ChatGPT suggested that could help.

if (json.stream) {
    let {
        readable,
        writable
    } = new TransformStream();
    aiResp.body.pipeThrough(transformer).pipeTo(writable);
    return new Response(readable, {
        headers: {
            'Content-Type': 'text/event-stream',
            'Cache-Control': 'no-cache',
            'Connection': 'keep-alive',
        }
    });
}
devcxl commented 5 months ago

The answer from ChatGPT is not reliable. I tried to split the TransformStream into readable and writable parts, then returned the readable part, but the result still seems to return all the results at once.

            const { readable,writable} = new TransformStream({
                // omit some code
            });

            // for now, nothing else does anything. Load the ai model.
            const aiResp = await ai.run(model, { stream: json.stream, messages });
            // Piping the readableStream through the transformStream

            aiResp.pipeTo(writable)
            return json.stream ? new Response(readable, {
                headers: {
                    'content-type': 'text/event-stream',
                    'Cache-Control': 'no-cache',
                    'Connection': 'keep-alive',
                },
            })
chand1012 commented 5 months ago

Your above code seems to work on my side, how are you testing this? Here is now I tested the streaming.

curl -H 'Content-Type: application/json' -d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": "Write me an essay on the formation of black holes"
    }
  ],
  "stream": true
}' http://localhost:8787/chat/completions

And it seemed to work fine. Another thing that should be updated is here. On my branch (linked below) I had to update the version to the latest version (2024-04-05 as of today) in order to get everything working properly.

I pushed the code to a separate branch for testing, however I intend to merge this PR and delete that branch once the code is working.

devcxl commented 5 months ago
name = "openai-cf"                # todo
main = "index.js"
compatibility_date = "2022-05-03"
compatibility_flags = [ "transformstream_enable_standard_constructor","streams_enable_constructors"]

I forgot to update the configuration in this branch as well.

devcxl commented 5 months ago

"Upgrading compatibility_date without using compatibility_flags is also possible."

chand1012 commented 5 months ago

Thank you for your contribution!