cloudflare / workers-rs

Write Cloudflare Workers in 100% Rust via WebAssembly
Apache License 2.0
2.5k stars 265 forks source link

[BUG] Streaming R2 object > 200MB in size exceeds CPU limit #389

Closed mcnulty closed 6 months ago

mcnulty commented 12 months ago

Is there an existing issue for this?

What version of workers-rs are you using?

main

What version of wrangler are you using?

0.0.0-3f61892d

Describe the bug

When streaming an R2 object with a size > 200 MB in size, a Rust based worker exceeds the CPU limit. When I rewrite the same worker in TypeScript, the CPU limit is not exceeded.

I've created a reproducer repository for the issue that can be used to reproduce the issue. I cannot reproduce this issue locally using wrangler. That is, I only run into this issue when deploying the worker and accessing a large file stored in the associated R2 bucket.

From what I can tell, the code that leads to the issue is:

    // body is a r2::ObjectBody
    let response_body = body.stream()?;

    Ok(Response::from_stream(response_body)?
        .with_headers(headers)
        .with_status(200))

Looking further in the workers-rs source, I think the underlying issue is that r2::ObjectBody returns a Rust stream that wraps the native ReadableStream from r2::ObjectBody::stream. Response::from_stream then converts the Rust stream into a ReadableStream, which is then passed to the native web_sys::Response::new_with_opt_readable_stream. I think this leads to invocation of the Rust wasm code while streaming chunks of the R2 object to the client.

To confirm this suspicion, I added the following function to r2::ObjectBody to allow a ResponseBody to be created using the r2::ObjectBody's inner ReadableStream. This ResponseBody can then be passed to Response::from_body to avoid converting the ReadableStream into a Rust stream and as a result, also avoid the invocation of the Rust code while streaming the R2 object:

    /// Returns a [ResponseBody] containing the data in the [Object].
    ///
    /// This function can be used to hand off the [Object] data to the workers runtime for streaming
    /// to the client in a [crate::Response]. This ensures that the worker does not consume CPU time
    /// while the streaming occurs, which can be significant if instead [ObjectBody::stream] is used.
    pub fn response_body(self) -> Result<ResponseBody> {
        if self.inner.body_used() {
            return Err(Error::BodyUsed);
        }

        Ok(ResponseBody::Stream(self.inner.body()))
    }

If this all makes sense, I can go ahead and open a PR with the response_body function added to r2::ObjectBody.

Steps To Reproduce

  1. Clone mcnulty/cf-worker-r2-download-bug-reproducer
  2. Follow the reproduction steps from that repository, which are included here:

After this last step, you should see the following:

GET https://r2-download-bug.$DOMAIN.workers.dev/200MB-file - Exceeded CPU Limit @ 9/20/2023, 10:40:04 AM
✘ [ERROR]   Error: Worker exceeded CPU time limit.
mbilker commented 6 months ago

I ran into this issue myself as well. Are you still willing to submit the fix as a PR? If not, I may try sending a PR with that fix with credit.

mcnulty commented 6 months ago

@mbilker Thanks for the reminder! I had a fix locally that I just submitted in #462.

I also updated the reproducer repo with a branch that shows how the fix can be used in a wrangler project to fix the bug.

mbilker commented 6 months ago

Thank you very much! I can confirm the response_body() works for my use-case as well and also that the Content-Length header is being set to the correct value.