annevk / orb

Opaque Response Blocking (CORB++)
Creative Commons Zero v1.0 Universal
35 stars 5 forks source link

How to decode potential JavaScript #7

Open annevk opened 3 years ago

annevk commented 3 years ago

We might not always have an encoding, e.g., fetch(..., { mode: "no-cors" }). Is it reasonable to always use UTF-8 for this check?

annevk commented 3 years ago

Looking at this again and in particular https://html.spec.whatwg.org/#fetch-a-classic-script I think the simplest option here is that we pass the encoding along with the request and then we need to abstract or duplicate these steps (and maybe improve them while we're at it, especially getting the charset parameter from the Content-Type header):

  1. If response's Content Type metadata, if any, specifies a character encoding, and the user agent supports that encoding, then set character encoding to that encoding (ignoring the passed-in value).
  2. Let source text be the result of decoding response's body to Unicode, using character encoding as the fallback encoding.
  3. Let script be the result of creating a classic script given source text, settings object, response's url, options, and muted errors.

And then if script's record is null parsing failed.

@domenic does that seem right to you?

domenic commented 3 years ago

I don't have the full context on what security guarantees we're trying to preserve here (is it bad to leak information about the Content-Type header?) but in terms of a spec refactoring, that seems reasonable.

domenic commented 3 years ago

and maybe improve them while we're at it, especially getting the charset parameter from the Content-Type header

Basically every usage of "Content-Type metadata" in HTML could be improved by using the new MIME type getter, I think.

annevk commented 3 years ago

One risk here is that the attacker has control over the encoding, so this technically gives them more opportunity to find a way to get something parsed as JavaScript. In practice it still seems hard to parse as JavaScript as the majority of significant bytes are in the ASCII range.

annevk commented 2 years ago

I included a fix for this in https://github.com/whatwg/fetch/pull/1442 which I think works. The HTML side will need to set it on requests, but that's a very straightforward change.

And while it is unfortunate that the fallback encoding is in the hands of the attacker, this is no different from the status quo.

annevk commented 2 years ago

I forgot that the response itself also carries encoding-related information. https://github.com/whatwg/fetch/pull/1447 tackles the first part of that. Once that lands it should be easy to call from Fetch's ORB PR.