cloudflare / workerd

The JavaScript / Wasm runtime that powers Cloudflare Workers
https://blog.cloudflare.com/workerd-open-source-workers-runtime/
Apache License 2.0
6.25k stars 300 forks source link

🐛 Bug Report — Inconsistent etag parsing error in R2 bindings api #1967

Closed flakey5 closed 5 months ago

flakey5 commented 7 months ago

Background

Issue

Last week, we started seeing a client pass in conditional headers that wrapped the etags in quotes (e.g. "W/"5a49a0bd-39326"" or ""asd123""). Sometimes this error is thrown, other times it isn't.

Repro

// worker.ts
interface Env {
  BUCKET: R2_BUCKET;
}

export default {
  async fetch(request: Request, env: Env) {
    await env.BUCKET.get('some-file', { onlyIf: request.headers, range: request.headers }); 
  }
}

// client.ts
for (let i = 0; i < 100; i++) {
  fetch('https://the-worker.com', {
    headers: {
      'If-None-Match': '"W/"5a49a0bd-39326""'
    }
  })
}

The Question

Is that etag format supported or something we need to validate before passing in?

jasnell commented 7 months ago

Spoke with @flakey5 about this a bit offline but posting here also... the etag "W/"5a49a0bd-39326"" is not valid syntax. What I would recommend for now is validating the input and ignoring invalid etags when making the request to r2. That should avoid the issue. That said, if r2 is having a problem with these then we should very likely have r2 also ignore invalidly formatted etag inputs... or at least handle them more gracefully.

flakey5 commented 7 months ago

Fwiw last time I checked the R2 bindings will consistently throw an error for some etag errors (off the top of my head I remember unquoted etags) but the error messages were very vague and took a bit to figure out what the actual problem was

harshal317 commented 7 months ago

As y'all noticed these are in fact invalid with the quotations. The error there isn't very helpful, we ought to improve the messaging. I think it'd make sense to remove this header parsing out of workerd and into R2s API implementation.

Frederik-Baetens commented 5 months ago

@harshal317 FWIW the current etag parsing logic in the R2 API used for S3 and public buckets is not compliant with rfc9110, so just switching to that would be a breaking change. Specifically I believe that it doesn't deal well with etag arrays with multiple commas like "etag1", , , , "etag2" which is compliant with the spec. It also does not allow commas in the etags themselves e.g. "etag1,etag2" should be parsed as a single etag, but we do not do so for our other entrypoints.

As far as I know, this implementation in workerd accepts every etag rfc9110 requires it to, but also more. For example, I don't believe that an etag like "etag1",* is supposed to be valid, but I just parse it as a wildcard etag.

Unfortunately you can't switch to an implementation that is equivalent to this one because I believe the R2 API implementation accepts etags wrapped in multiple quotes like ""etag1"", ""etag2"", something which is not accepted by this workerd implementation, nor are they considered valid by RFC9110. So you can't make the R2 API implementation more restrictive by switching to an implementation equivalent to this one.

As far as I know is also difficult to create an implementation that encompasses both current implementations in laxness, because then it will become very difficult to know how to separate etags from eachother, since neither commas nor quotes can be used to separate them.

That said, moving this into the R2 API can make sense, but I think you'd probably need a separate implementation. If it's just about wanting to have this logic in JS, maybe it's worth to bite the bullet on converting some of our bindings to JS.