Rethink how we make sanitizeToString an mXSS-safe method?

shhnjk commented 4 years ago

Interesting blog post came out today about recent DOMPurify bypass: https://research.securitum.com/mutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass/

The blog post ends with interesting conclusion:

The bypass also made me realize that the pattern of div.innerHTML = DOMPurify.sanitize(html) is prone to mutation XSS-es by design and it’s just a matter of time to find another instances. I strongly suggest that you pass RETURN_DOM or RETURN_DOM_FRAGMENT options to DOMPurify, so that the serialize-parse roundtrip is not executed.

This is equivalent to our sanitizeToString. Maybe we should drop the support of sanitizeToString all together? I think sanitize should work for most of cases, and developers can create sanitizeToString behavior by themselves if they really need to, but we can choose to not recommend this for mXSS purpose.

WDYT?

koto commented 4 years ago

To me, it's rather yet another example of why sanitizers (the ones that aim to prevent XSS vectors) should not output elements from non-html namespaces.

shhnjk commented 4 years ago

That makes sense, but what if destination element is actually not an HTML namespace? e.g.: svg.innerHTML = sanitizer.sanitizeToString(untrusted);

Maybe we want an optional argument in creationOptions, where developers can tell which namespace they plan to assign (default to HTML) and we change parser based on that information?

securityMB commented 4 years ago

Since @shhnjk pointed me to this thread on Twitter, here's my two cents:

I think that sanitizing to string in JS is risky from both conceptual and practical level.

Conceptual, because we lose some of advantages brought by sanitizing in the browser; even the spec admits that serialization and reparsing of the DOM tree is not guaranteed to return the original DOM tree. Hence there's always a risk that a DOM tree that the sanitizer believes to be safe will become dangerous after serialize-reparse. To truly benefit from sanitizing in the browser, we should work on document fragments (or just nodes) so that the sanitizer knows exactly what elements, attributes and text content are in there.

Looking from a practical standpoint, AFAIK most of recent bypasses of sanitizers are result of serialization quirks. For instance, Gareth Hayes found DOMPurify bypass in Edge because it mutated the title tag from:

<x/><title>&amp;lt;/title&amp;gt;&amp;lt;img src=1 onerror=alert(1)&amp;gt;

to

<title></title><img src=1 onerror=alert(1)></title>

Note that the bypass wouldn't work if DocumentFragment was returned as </title><img src=1 onerror=alert(1)> would still be the text content of <title>.

To summarize, I'd recommend to scrap sanitizeToString and promote usage of sanitize that returns DocumentFragment or DOM Node.

shhnjk commented 4 years ago

CC: @otherdaniel, @cure53, @mozfreddyb

koto commented 4 years ago

I have a slightly different p.o.v. here. There's three kinds of reserialization quirks that can cause mXSS:

one is a result of spec issues (some trees are impossible to reserialize without changing their semantics - e.g. ones with comment nodes containing -->)
second is caused by mixed HTML/XML-y namespaces
third is caused by browser bugs, usually when faced with a non-well-formed input.

I don't think the sanitizer API should be concerned with browser bugs. mXSS in DOM tree serialization should be addressed like any other security-relevant browser bug, but the spec is not the place to address it. What we should be concerned about is the first and second quirks. These can be prevented for example by disallowing non-HTMLy namespaces, and HTML comments - because we know it cannot be possible to serialize them to a string in an idempotent way.

Even accounting for browser bugs, mXSS quirks could be addressed inside the API by performing the validation on the output string before returning it to caller:

checking for idempotency (the output string, when reserialized to DOMFragment, and then back to a string, should be the same).
checking for unwanted elements (re-run the sanitizer on the DOM created from the output string).

koto commented 4 years ago

Whether we want it or not, authors do want sanitization to a string (as evidenced by the popularity of current sanitizers). If we only allow the sanitization to fragments, users will be forced to use outerHTML and, if said fragments contained mXSSy payloads, will end up being vulnerable.

I think it's better to provide an API that gives one a DOM tree or a string that is known to be mXSS-safe under reserialization, rather than an API that outputs a DOM tree that one still has to be careful not to reserialize (also because in browsers without Trusted Types it's impossible to prohibit reseralization of that tree).

shhnjk commented 4 years ago

I agree that developers would want to use serialized string as an assignment.

So how do we ensure that the sanitizer API is mXSS safe?

Do we want to reparse and reserialized and check 2 output is equal when returning sanitizeToString? Do we return 2nd output if the 2 outputs were different? Or return nothing at all?
If we enforce to only produce element within same namespace, I think that means we won't be allowing svg and math tag by default (for HTML namespace). Which won't align with API's secondary goal because I think sanitization of clipboard does allow svg tag.

koto commented 4 years ago

I agree that developers would want to use serialized string as an assignment.

So how do we ensure that the sanitizer API is mXSS safe?

Do we want to reparse and reserialized and check 2 output is equal when returning sanitizeToString? Do we return 2nd output if the 2 outputs were different? Or return nothing at all?

I think sanitizer should throw on unstable output, but if might also return an empty string. I don't think any of the strings or DOM fragments in such case could be safer than the other. The API should signal the rejection of the input, as it's impossible to sanitize it.

If we enforce to only produce element within same namespace, I think that means we won't be allowing svg and math tag by default (for HTML namespace).

I personally would go further than that - I don't think svg or mathml NS nodes should ever be outputted. It's impossible to create a safe sanitization of these mixed-namespace trees.

Which won't align with API's [secondary goal] (https://github.com/WICG/sanitizer-api#secondary-goals) because I think sanitization of clipboard does allow svg tag.

/shrug. I think we have enough evidence in the history of sanitizer bypasses, that attempting to mix regular HTML with XMLy namespaces and their HTML attachment points is always going to be susceptible to at least mXSS, and very likely other browser bugs. Authors that intend to process such attacker-controlled input, have to assume XSS is possible, it's just that the exact vector might not be known yet.

shhnjk commented 4 years ago

/shrug. I think we have enough evidence in the history of sanitizer bypasses, that attempting to mix regular HTML with XMLy namespaces and their HTML attachment points is always going to be susceptible to at least mXSS, and very likely other browser bugs.

Yes, this has happened already in Chromium ☺

Generally I agree with Koto, if we have a good enough way to be mXSS-safe. Additionally, if we can break existing clipboard sanitizer, we might be able to check namespace of element user is pasting to, and change parser of sanitizer based on that.

mozfreddyb commented 4 years ago

imho, we ought to nudge folks to receive a DocFragment from the API and use Node.append(). This is saving them (and us) an additional serialization & parsing roundtrip, which eventually leads to better performance.

So far, my main carrot & stick approach is to provide a short and nice sanitize() function and an uglier sanitizeToString() function. Maybe we can think of better carrots and sticks, but I think we'll just alienate developers if we remove string serialization completely.

(Alternatively, we could go all the way and enforce string serializations to always carry explicit namespaces, but that's just ugly.)

Sora2455 commented 3 years ago

As a suggestion, you could change the API to be like this:

sanitizer.sanitizeInto(svg, untrusted);

That way, you know the namespace the sanitized elements are being inserted into, and the dev never gets a chance to worry about document fragments or strings.

annevk commented 1 year ago

We no longer plan to offer serialization to a string.

WICG / sanitizer-api

Rethink how we make sanitizeToString an mXSS-safe method? #37