durch / rust-s3

Rust library for interfacing with S3 API compatible services
MIT License
498 stars 195 forks source link

`NoSuchUpload` error for cloudflare's r2 (s3-compatible) for 33MB file when using `put_object_stream` #302

Open Niedzwiedzw opened 1 year ago

Niedzwiedzw commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce

            let mut path = tokio::fs::File::open(&file)
                .await
                .context("failed to open file for sending to S3")?;
            let code = config
                .bucket
                .as_ref()
                .context("bucket is uninitialized")?
                .put_object_stream(&mut path, s3_path.as_ref())
                .await
                .context(format!(
                    "failed to send file to S3: {}",
                    file.as_ref().display()
                ))?;
// produces:
   1: failed to send file to // -- redacted

      Caused by:
          Got HTTP 404 with content '<?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchUpload</Code><Message>The specified multipart upload does not exist.</Message></Error>'
**Expected behavior**
A clear and concise description of what you expected to happen.

Environment

Additional context It used to work for DigitalOcean, but I've migrated so not sure if it's that or if it's some change in the library itself... I'll keep debugging

Niedzwiedzw commented 1 year ago

switching to "normal" upload (not-multipart) fixes the issue and it's fine for me

durch commented 1 year ago

@Niedzwiedzw I've ran into this with tests, I'm not sure if R2 supports multipart upload, it might also be "too eventually consistent", so the upload parts don't get created by the time we check for them all

Niedzwiedzw commented 1 year ago

it happens every time, and also I've asked on cloudflare's discord - they suggested it's to do with concurrency (max 2 uploads at a time)

durch commented 1 year ago

Interesting so do you think something like throttling R2 uploads would help?

Frederik-Baetens commented 1 year ago

Did some investigation into this:

I modified the rust-s3 source code to panic after initiating the multipart upload:

        dbg!(s3_path);

        let msg = self
            .initiate_multipart_upload(s3_path, content_type)
            .await?;
        let path = msg.key;
        let upload_id = &msg.upload_id;

        dbg!(&upload_id);

        dbg!(self.list_multiparts_uploads(Some(""), Some("")).await?);

        panic!("aborting here");

This way I can have a look at the ongoing multipart uploads being produced by the code.

The multipart uploads definitely exist & they're immediately visible in that dbg! call at the end of that code snippet. R2 is strongly consistent. Once you get confirmation of the MultipartUpload creation back, that multipart upload is immediately visible everywhere.

I don't think this has anything to do with concurrency limits since this also happens when doing a streaming upload with just 2 parts. You can initiate such an upload by just creating a file of 15 MiB, which automatically gets split into 2 chunks by the implementation. With just 2 parts in a multipart upload, it should be impossible to run into R2's multipartUpload concurrency limits (which will be lifted soon).

This is as far as I got with my investigation, but I hope that helps. Inspecting the raw http requests you're sending with reqwest might be of help here. My best guess is that you might be altering the key or uploadID when sending subsequent requests interacting with the ongoing multipart upload.

I hope that's helpful in figuring this out. If you think R2 is incompatible with S3 as documented here, instead of rust-s3 relying on an implementation detail, I'd love to hear about it.

P.s. Is it of note that i needed self.list_multiparts_uploads(Some(""), Some("")) instead of the self.list_multiparts_uploads(Some("/"), Some("/")).await?); from the docs example to list all multipart uploads? Might that point to mismatched expectations between rust-s3 and R2?

durch commented 1 year ago

@Frederik-Baetens thanks for digging in, I can’t say I have much inside, the only argument I have is that multipart uploads seem to work on s3, but not in r2, so there has to be some inconsistency. I’ll try and find some time to dig into this…

Frederik-Baetens commented 1 year ago

I also noticed that the ongoing multipart uploads strip off the / at the beginning of the key. E.g. if i upload /hi, the multipart upload will just be for "hi". Could it be that some part of your code is stripping the /, and another is expecting it to always be there, with the expectation that the S3 compatible object storage implementation maps both versions to the same file?

durch commented 1 year ago

That’s an interesting observation, encoding of slashes has been a contentious topic at one point, there even was a dedicated feature for handling it, I think you might be on to something here