Closed shhnjk closed 1 year ago
To me, it's rather yet another example of why sanitizers (the ones that aim to prevent XSS vectors) should not output elements from non-html namespaces.
That makes sense, but what if destination element is actually not an HTML namespace?
e.g.:
svg.innerHTML = sanitizer.sanitizeToString(untrusted);
Maybe we want an optional argument in creationOptions
, where developers can tell which namespace they plan to assign (default to HTML) and we change parser based on that information?
Since @shhnjk pointed me to this thread on Twitter, here's my two cents:
I think that sanitizing to string in JS is risky from both conceptual and practical level.
Conceptual, because we lose some of advantages brought by sanitizing in the browser; even the spec admits that serialization and reparsing of the DOM tree is not guaranteed to return the original DOM tree. Hence there's always a risk that a DOM tree that the sanitizer believes to be safe will become dangerous after serialize-reparse. To truly benefit from sanitizing in the browser, we should work on document fragments (or just nodes) so that the sanitizer knows exactly what elements, attributes and text content are in there.
Looking from a practical standpoint, AFAIK most of recent bypasses of sanitizers are result of serialization quirks. For instance, Gareth Hayes found DOMPurify bypass in Edge because it mutated the title
tag from:
<x/><title>&lt;/title&gt;&lt;img src=1 onerror=alert(1)&gt;
to
<title></title><img src=1 onerror=alert(1)></title>
Note that the bypass wouldn't work if DocumentFragment
was returned as </title><img src=1 onerror=alert(1)>
would still be the text content of <title>
.
To summarize, I'd recommend to scrap sanitizeToString
and promote usage of sanitize
that returns DocumentFragment
or DOM Node
.
CC: @otherdaniel, @cure53, @mozfreddyb
I have a slightly different p.o.v. here. There's three kinds of reserialization quirks that can cause mXSS:
-->
) I don't think the sanitizer API should be concerned with browser bugs. mXSS in DOM tree serialization should be addressed like any other security-relevant browser bug, but the spec is not the place to address it. What we should be concerned about is the first and second quirks. These can be prevented for example by disallowing non-HTMLy namespaces, and HTML comments - because we know it cannot be possible to serialize them to a string in an idempotent way.
Even accounting for browser bugs, mXSS quirks could be addressed inside the API by performing the validation on the output string before returning it to caller:
Whether we want it or not, authors do want sanitization to a string (as evidenced by the popularity of current sanitizers). If we only allow the sanitization to fragments, users will be forced to use outerHTML
and, if said fragments contained mXSSy payloads, will end up being vulnerable.
I think it's better to provide an API that gives one a DOM tree or a string that is known to be mXSS-safe under reserialization, rather than an API that outputs a DOM tree that one still has to be careful not to reserialize (also because in browsers without Trusted Types it's impossible to prohibit reseralization of that tree).
I agree that developers would want to use serialized string as an assignment.
So how do we ensure that the sanitizer API is mXSS safe?
sanitizeToString
? Do we return 2nd output if the 2 outputs were different? Or return nothing at all?I agree that developers would want to use serialized string as an assignment.
So how do we ensure that the sanitizer API is mXSS safe?
- Do we want to reparse and reserialized and check 2 output is equal when returning
sanitizeToString
? Do we return 2nd output if the 2 outputs were different? Or return nothing at all?
I think sanitizer should throw on unstable output, but if might also return an empty string. I don't think any of the strings or DOM fragments in such case could be safer than the other. The API should signal the rejection of the input, as it's impossible to sanitize it.
- If we enforce to only produce element within same namespace, I think that means we won't be allowing svg and math tag by default (for HTML namespace).
I personally would go further than that - I don't think svg or mathml NS nodes should ever be outputted. It's impossible to create a safe sanitization of these mixed-namespace trees.
Which won't align with API's [secondary goal] (https://github.com/WICG/sanitizer-api#secondary-goals) because I think sanitization of clipboard does allow svg tag.
/shrug. I think we have enough evidence in the history of sanitizer bypasses, that attempting to mix regular HTML with XMLy namespaces and their HTML attachment points is always going to be susceptible to at least mXSS, and very likely other browser bugs. Authors that intend to process such attacker-controlled input, have to assume XSS is possible, it's just that the exact vector might not be known yet.
/shrug. I think we have enough evidence in the history of sanitizer bypasses, that attempting to mix regular HTML with XMLy namespaces and their HTML attachment points is always going to be susceptible to at least mXSS, and very likely other browser bugs.
Yes, this has happened already in Chromium ☺
Generally I agree with Koto, if we have a good enough way to be mXSS-safe. Additionally, if we can break existing clipboard sanitizer, we might be able to check namespace of element user is pasting to, and change parser of sanitizer based on that.
imho, we ought to nudge folks to receive a DocFragment from the API and use Node.append(). This is saving them (and us) an additional serialization & parsing roundtrip, which eventually leads to better performance.
So far, my main carrot & stick approach is to provide a short and nice sanitize()
function and an uglier sanitizeToString()
function. Maybe we can think of better carrots and sticks, but I think we'll just alienate developers if we remove string serialization completely.
(Alternatively, we could go all the way and enforce string serializations to always carry explicit namespaces, but that's just ugly.)
As a suggestion, you could change the API to be like this:
sanitizer.sanitizeInto(svg, untrusted);
That way, you know the namespace the sanitized elements are being inserted into, and the dev never gets a chance to worry about document fragments or strings.
We no longer plan to offer serialization to a string.
Interesting blog post came out today about recent DOMPurify bypass: https://research.securitum.com/mutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass/
The blog post ends with interesting conclusion:
This is equivalent to our
sanitizeToString
. Maybe we should drop the support ofsanitizeToString
all together? I thinksanitize
should work for most of cases, and developers can createsanitizeToString
behavior by themselves if they really need to, but we can choose to not recommend this for mXSS purpose.WDYT?