Open annevk opened 6 months ago
You're thinking in the right direction. The use cases will be generally almost the same, but often slightly different in the details, so allowlist and whitelist (as discussed in #229) in addition to a pre-config is the right solution.
You're mentioning a few interesting cases. I concur with others that protection against XSS attacks should be the baseline and an opt-out should be hard or come with warnings.
On top of this minimal baseline, there are many additional requirements, depending on the use cases. Many of them cannot be meaningfully expressed by allowing or blocking specific tags or attributes. "Automatic fetching of resources" is a very good example. Compare https://github.com/cure53/DOMPurify/issues/951 and the reasons why it's not practically possible to use blocklists for that. Given that we lack "seamless" iframes (sizing to their content), CSP doesn't work most use cases. Additionally, in an HTML email application like Thunderbird, the user wants a switch that allows external resources only for specific emails, so a sanitizer feature like this should be easy to switch on or off.
I would suggest to a) have multiple pre-configs to choose from (e.g. including or excluding SVG and MathML) b) optional on/off switches on top of it (e.g. load external resources, nested docs etc.) c) allow and block lists on top of it
Some things to look out for can be found in https://github.com/w3c/webappsec/issues/656 such as <link rel="prefetch">
.
We already have a difference between what is allowed by default and what is truly blocked. A
foo
element is not allowed by default, but you could allow it as it's not an XSS risk. I.e., there's a backing XSS blocklist that makes that distinction possible.We discussed adding
iframe
to this backing XSS blocklist a week ago, but what if we kept it a pure XSS blocklist, and instead blocked a lot more by default to make it truly safe:<base>
<meta>
This needs some more thought:
target
and opener? Probably the best is to allow<a href>
and not<a target rel>
. Most other attributes are probably okay?The one tricky aspect is that this even further exposes a weakness in the current setup that it's "hard" to manipulate the default configuration. You have to extract it as per https://wicg.github.io/sanitizer-api/#configobject and then manipulate it. Which is a little annoying.
As discussed in the meeting we might need to have a better way to manipulate configurations from the get go.