AI Streaming result - Githubissues

gerhardcit commented 9 months ago

Which Cloudflare product(s) does this pertain to?

Workers AI

Subject Matter

How to get hold of the ai.run results streamed to the client.

Content Location

https://developers.cloudflare.com/workers-ai

Additional information

ai.run has an option to stream. This stream can be sent to the client. Please provide a clear example of how one would catch the full result of the stream on the server side in order to log the result, size of the result or anything else about it.

It's very probable that you would want to keep track of answers given to clients, and therefore having the full result is important. this is easily logged in a blocking result, but not when streaming.

the stream produced by ai.stream does not have a .on('data') event which would be what you would expect.

scenaristeur commented 6 months ago

Hi @gerhardcit

I can get some result with the code below , adapting the two lines with your worker_name and your account_name or YOUR_ID

    fetch('https://worker_name.account_name.workers.dev')
    // fetch(`https://api.cloudflare.com/client/v4/accounts/YOUR_ID/ai/run/${model}`, {
    //  headers: { Authorization: `Bearer ${API_TOKEN}`, Accept: "text/event-stream", 'Content-Type': 'application/json' },
    //  method: 'POST',
    //  body: JSON.stringify(input),
    // })

and creating a .env file if you want to use the api

import 'dotenv/config';

// https://dev.to/bsorrentino/how-to-stream-data-over-http-using-node-and-fetch-api-4ij2
const API_TOKEN = process.env.CLOUDFARE_API_TOKEN;
/**
 * Generator function to stream responses from fetch calls.
 *
 * @param {Function} fetchcall - The fetch call to make. Should return a response with a readable body stream.
 * @returns {AsyncGenerator<string>} An async generator that yields strings from the response stream.
 */
async function* streamingFetch(fetchcall) {
    const response = await fetchcall();
    // Attach Reader
    const reader = response.body.getReader();
    while (true) {
        // wait for next encoded chunk
        const { done, value } = await reader.read();
        // check if stream is done
        if (done) break;
        // Decodes data chunk and yields it
        let chunk = new TextDecoder().decode(value).trim()
        yield {chunk: chunk};

    }
}

(async () => {
    let model = '@cf/meta/llama-2-7b-chat-int8';
    let input = {
        stream: true,
        messages: [
            {
                role: 'system',
                content: 'You are a friendly assistant that helps write stories',
            },
            {
                role: 'user',
                content: 'Bonjour',
            },
        ],
    };
    for await (let chunk of streamingFetch(() =>
        fetch('https://worker_name.account_name.workers.dev')
        // fetch(`https://api.cloudflare.com/client/v4/accounts/YOUR_ID/ai/run/${model}`, {
        //  headers: { Authorization: `Bearer ${API_TOKEN}`, Accept: "text/event-stream", 'Content-Type': 'application/json' },
        //  method: 'POST',
        //  body: JSON.stringify(input),
        // })
    )) {
        console.log(chunk);
        //process.stdout.write(chunk)
    }
})();

but in both case (worker ai or api) the result is not usable : it is in text format looking like some json :

each chunk seem to look like 'data: {"response":" also"}\n\n' but sometimes, some chunk are aggregated like in this picture. It would be better to have pure json , not "data: +pseudoJson"

scenaristeur commented 6 months ago

this solution looks good https://github.com/craigsdennis/chat-workers-ai-base

cloudflare / cloudflare-docs

AI Streaming result #12195

Which Cloudflare product(s) does this pertain to?

Subject Matter

Content Location

Additional information