Closed benbucksch closed 6 months ago
Heya, with this neither being part of our threat model nor what we believe is achievable with a library like ours (given the problems you mentioned), I don't think we will work on a feature like this anytime soon without additional input or help.
I think what first needs to be done is to find out the following:
Can we even do this and, if so, how?
Would CSP be the answer, i.e. injecting an inline policy to block all outgoing request? Or would a list of known request emitting elements and attributes as well as CDATA be the way to go?
What are your thoughts, how would this best be approached?
As with everything in security, I would go with a multiple approach protection:
<video>
) and attribute names that have URLs and remove or replace them. I would start with composing a (long) list of tag and attribute names, and then blacklist them.url()
values and remove or replace them. In HTML, we could detect any property value starting with "http:" or "https:" and remove or replace it.We do in fact have a project that once attempted to catch them all, here it is:
https://github.com/cure53/HTTPLeaks
However, this does not automatically cover new ways of leaking HTTP requests, so it will have to be actively maintained and such approach might be very prone to bypasses at first until it matures.
CSP is a great idea, if this can be done. Can you make a proposal what you have in mind there?
My thinking was, simply inject a META tag into every sanitized result that disallows anything to be requested unless it's same origin - or even nothing at all. This can already be done just so, by simply using a hook and injecting the META tag.
Oh, and one important bit of info, I will not be working on this implementation at all, I do not have time for this - but I am very open to reviewing designs, ideas, and pull requests. Just to clarify early on :slightly_smiling_face:
I think this should be quite close to what you need, correct? It's a (naive and very bad) implementation of a toggle for fetching content or not using CSP. I chose a sandboxed iframe with srcdoc attribute, sanitize with default settings and simply inject the right CSP policy depending on what the user chose.
<!doctype html>
<html>
<head>
<script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.1.2/purify.min.js"></script>
</head>
<body>
<!-- Our IFRAME to receive content -->
<iframe sandbox srcdoc id="sanitized"></iframe>
<p>
By default, nothing will be fetched, click button to toggle fetch on or off (see location.hash)
</p>
<p>
<button onclick="location.hash ? location.hash = '' : location.hash = 'yes'">Fetch content?</button>
<button onclick="location.reload();">Reload page</button>
</p>
<!-- Now let's sanitize that content -->
<script>
'use strict';
// Specify dirty HTML
const dirty = `<body><img src=https://cure53.de/img/menu/cure_53_logo.svg><p>HELLO<iframe/\/src=JavScript:alert(1)></ifrAMe><br>goodbye</p>`;
// Specify strict inline CSP policy
let csp = ``;
if (location.hash.match(/yes/)) {
csp = `<meta http-equiv="Content-Security-Policy" content="default-src *">`;
} else {
csp = `<meta http-equiv="Content-Security-Policy" content="default-src 'none'">`;
}
// Clean HTML string and write into the IFRAME
const clean = DOMPurify.sanitize(dirty);
sanitized.srcdoc = csp + clean;
</script>
</body>
</html>
@cure53
Leaves the case where I need only a part of the page to be sanitized, e.g. in a web forum or conversation, or a "Description" field in a page and I want it to size to the content. "seamless" iframes (sizing to their content) would be perfect, but unfortunately they were removed. Leaves the hacks that resize the iframe based on its content using JS, but browser security makes it hard, too (I have to reach into that iframe with a different origin).
There's a similar discussion in the sanitizer API in #228.
Leaves the case where I need only a part of the page to be sanitized
Would that no be doable using the IN_PLACE
config or by working with nodes directly?
Closing this for now, as there no action planned.
Background & Context
Need to block all direct server loads, i.e. any parts of the HTML that trigger any HTTP or server requests on rendering, without user interaction. Normal links like
<a href="">
which activate only on user click should stay.Why:
When dealing with untrusted HTML, HTTP calls triggered by it can be a major problem, depending on use case: If the HTTP call is to the same site as the target of the HTML injection, it may be a security problem, if the server doesn't protect itself against it.
Avoid data leaks and unintentional data triggers or exflitration (third party). E.g. if I allow web forum users to post HTML sniplets, I do not want them to get a HTTP ping including IP address and time of reading from every reader of the post on my web forum. Similarly, when I sanitize an email, I need to filter outgoing HTTP calls, to prevent spammers from getting receive or read notifications, or even IP addresses and the times when a message was read.
Bug
Input
<img src="">
<img srcset="">
<video src="">
<video><source>
<svg><g>
<link>
preloadCSS
@import
url()
, some samples from sanitize CSS demo, but there are more:list-style: url(https://leaking.via/css-list-style);
list-style-image: url(https://leaking.via/css-list-style-image);
background: url(https://leaking.via/css-background);
background-image: url(https://leaking.via/css-background-image);
border-image: url(https://leaking.via/css-border-image);
border-image-source: url(https://leaking.via/css-border-image-source);
shape-outside: url(https://leaking.via/css-shape-outside);
cursor: url(https://leaking.via/css-cursor), auto;
svg circle
mask: url(https://leaking.via/svg-css-mask#foo);
filter: url(https://leaking.via/svg-css-filter#foo);
clip-path: url(https://leaking.via/svg-css-clip-path#foo);
and tons and tons of others.
Some are not even HTML tags nor attributes nor CSS values.
Given output
URL stays in sanitized HTML output, triggering direct HTTP loads on rendering.
Expected output
All URLs that would be loaded directly are removed from the HTML. When rendering the sanitized HTML, no outgoing calls are made.
Non-working solution
https://github.com/cure53/DOMPurify/blob/main/demos/hooks-link-proxy-demo.html has example code, but that replaces only 3 specific attributes. However, on the web platform, there is a huge amount of features that all trigger server requests (see above for a very small and incomplete subset). There are constantly new ways added to the HTML platform, some are non-standard and experimental.
It is practically impossible for an individual app to keep up with all these. This list needs to be centrally managed by a library.
Feature
Add a feature switch that removes a URLs that would trigger a direct load on rendering, without user interaction. Maintain links that activate only on user interaction/click. (Of course, retain all other sanitization features, including JS code removal, XSS removal etc.)