cloudflare / workerd

The JavaScript / Wasm runtime that powers Cloudflare Workers
https://blog.cloudflare.com/workerd-open-source-workers-runtime/
Apache License 2.0
6.25k stars 300 forks source link

🐛 BUG: Tee-ing a stream twice incorrectly detects buffering #983

Closed sdnts closed 7 months ago

sdnts commented 1 year ago

Which Cloudflare product(s) does this pertain to?

Workers Runtime

What version(s) of the tool(s) are you using?

3.4.0

What version of Node are you using?

18.16.0

What operating system are you using?

macOS

Describe the Bug

If I tee a stream twice (say the request body), I get a warning that says Your worker created multiple branches of a single stream (for instance, by calling response.clone() or request.clone()) but did not read the body of both branches ..., but as far as I can tell, I am reading all branches. I have some code to demonstrate (I'm teeing the request body to three different keys on R2, since that is the easiest way for me to verify that I am reading all branches. The issue probably has nothing to do with R2 though):

export default {
  async fetch(
    request: Request,
    env: Env
  ): Promise<Response> {
    // For PUTs, I'll write the request body to three separate keys on R2
    if (request.method === "PUT") {
      if (!request.body) return new Response("no body");

      const [body, stream1] = request.body.tee();
      const [stream2, stream3] = body.tee();

      await Promise.all([
        env.r2.put("1", stream1),
        env.r2.put("2", stream2),
        env.r2.put("3", stream3),
      ]);

      return new Response("ok");
    }

    // For GETs, I'll fetch and concatenate the three keys I wrote above
    const [one, two, three] = await Promise.all([
      env.r2.get("1").then((b) => b!.text()),
      env.r2.get("2").then((b) => b!.text()),
      env.r2.get("3").then((b) => b!.text()),
    ]);

    return new Response(`${one}\n${two}\n${three}`);
  },
}

If I now run wrangler dev & send a PUT with a body of something, followed by a GET, I get back the response:

something
something
something

which is what I would expect, since I am reading all three branches of the request body to completion.

I'm no streams expert, but I think tee-ing twice like this should not be a problem? (and clearly it works, even after wrangler deploy)

Assuming my example is sane, I believe the warning should not be shown.

Please provide a link to a minimal reproduction

No response

Please provide any relevant error logs

No response

mrbbot commented 1 year ago

Also bumped into this today. This is reproducible with the following pathological code:

export default {
  async fetch(request, env, ctx) {
    const response = await fetch("https://example.com");

    await response.clone().arrayBuffer();
    await response.clone().arrayBuffer();

    return response;
  }
}

(In this case the worker was actually writing to cache, an R2 bucket, and returning the response)

Connecting an inspector and making a request results in the aforementioned Your worker created multiple branches of a single stream warning.

jasnell commented 1 year ago

Should be able to investigate on Thursday.

geomaster commented 10 months ago

Reproducing the issue on 3.22.4. Interestingly, I didn't see any such messages on 3.17, even when legitimate (code is actually not reading all branches).

Cherry commented 8 months ago

I also hit this with similar code today:

export default {
    async fetch(request, env, ctx) {
        const response = await fetch('https://example.com')

        // some logging middleware
        const clonedResponse = response.clone();
        console.log(await clonedResponse.text());

        // some logging middleware
        const clonedResponse2 = response.clone();
        console.log(await clonedResponse2.text());

        // and then actually return
        return response;
    },
};

which nowadays produces a very scary error:

✘ [ERROR] Your worker created multiple branches of a single stream (for instance, by calling `response.clone()` or `request.clone()`) but did not read the body of both branches. This is wasteful, as it forces the system to buffer the entire stream of data in memory, rather than streaming it through. This may cause your worker to be unexpectedly terminated for going over the memory limit. If you only meant to copy the request or response headers and metadata (e.g. in order to be able to modify them), use the appropriate constructors instead (for instance, `new Response(response.body, response)`, `new Request(request)`, etc).
jasnell commented 8 months ago

Just a heads up... we landed a PR this week that improves the warning message for these cases. Instead of multiple warnings being emitted without context, at most one warning message will be emitted per request listing the stack locations where the unread branch was created. This should help provide additional detail to track down where the unread branches are coming from.

jasnell commented 8 months ago

Also, there was definitely a bug in here! #1839 should fix it :-)

jasnell commented 7 months ago

Just ... FYI, issue was closed automatically by landing #1839 but the fix for the extraneous warning likely won't go out to production until next week.