cure53 / DOMPurify

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:
https://cure53.de/purify
Other
13.32k stars 690 forks source link

Need to block external calls, e.g. all HTTP requests #951

Closed benbucksch closed 1 month ago

benbucksch commented 2 months ago

Background & Context

Need to block all direct server loads, i.e. any parts of the HTML that trigger any HTTP or server requests on rendering, without user interaction. Normal links like <a href=""> which activate only on user click should stay.

Why:

  1. When dealing with untrusted HTML, HTTP calls triggered by it can be a major problem, depending on use case: If the HTTP call is to the same site as the target of the HTML injection, it may be a security problem, if the server doesn't protect itself against it.

  2. Avoid data leaks and unintentional data triggers or exflitration (third party). E.g. if I allow web forum users to post HTML sniplets, I do not want them to get a HTTP ping including IP address and time of reading from every reader of the post on my web forum. Similarly, when I sanitize an email, I need to filter outgoing HTTP calls, to prevent spammers from getting receive or read notifications, or even IP addresses and the times when a message was read.

Bug

Input

CSS

and tons and tons of others.

Some are not even HTML tags nor attributes nor CSS values.

Given output

URL stays in sanitized HTML output, triggering direct HTTP loads on rendering.

Expected output

All URLs that would be loaded directly are removed from the HTML. When rendering the sanitized HTML, no outgoing calls are made.

Non-working solution

https://github.com/cure53/DOMPurify/blob/main/demos/hooks-link-proxy-demo.html has example code, but that replaces only 3 specific attributes. However, on the web platform, there is a huge amount of features that all trigger server requests (see above for a very small and incomplete subset). There are constantly new ways added to the HTML platform, some are non-standard and experimental.

It is practically impossible for an individual app to keep up with all these. This list needs to be centrally managed by a library.

Feature

Add a feature switch that removes a URLs that would trigger a direct load on rendering, without user interaction. Maintain links that activate only on user interaction/click. (Of course, retain all other sanitization features, including JS code removal, XSS removal etc.)

cure53 commented 2 months ago

Heya, with this neither being part of our threat model nor what we believe is achievable with a library like ours (given the problems you mentioned), I don't think we will work on a feature like this anytime soon without additional input or help.

I think what first needs to be done is to find out the following:

Can we even do this and, if so, how?

Would CSP be the answer, i.e. injecting an inline policy to block all outgoing request? Or would a list of known request emitting elements and attributes as well as CDATA be the way to go?

What are your thoughts, how would this best be approached?

benbucksch commented 2 months ago

As with everything in security, I would go with a multiple approach protection:

cure53 commented 2 months ago

We do in fact have a project that once attempted to catch them all, here it is:

https://github.com/cure53/HTTPLeaks

However, this does not automatically cover new ways of leaking HTTP requests, so it will have to be actively maintained and such approach might be very prone to bypasses at first until it matures.

CSP is a great idea, if this can be done. Can you make a proposal what you have in mind there?

My thinking was, simply inject a META tag into every sanitized result that disallows anything to be requested unless it's same origin - or even nothing at all. This can already be done just so, by simply using a hook and injecting the META tag.

Oh, and one important bit of info, I will not be working on this implementation at all, I do not have time for this - but I am very open to reviewing designs, ideas, and pull requests. Just to clarify early on :slightly_smiling_face:

cure53 commented 2 months ago

I think this should be quite close to what you need, correct? It's a (naive and very bad) implementation of a toggle for fetching content or not using CSP. I chose a sandboxed iframe with srcdoc attribute, sanitize with default settings and simply inject the right CSP policy depending on what the user chose.

<!doctype html>
<html>
    <head>
        <script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.1.2/purify.min.js"></script>
    </head>
    <body>
        <!-- Our IFRAME to receive content -->
        <iframe sandbox srcdoc id="sanitized"></iframe>

        <p>
            By default, nothing will be fetched, click button to toggle fetch on or off (see location.hash)
        </p>
        <p>
            <button onclick="location.hash ? location.hash = '' : location.hash = 'yes'">Fetch content?</button>
            <button onclick="location.reload();">Reload page</button>
        </p>

        <!-- Now let's sanitize that content -->
        <script>
            'use strict';

            // Specify dirty HTML
            const dirty = `<body><img src=https://cure53.de/img/menu/cure_53_logo.svg><p>HELLO<iframe/\/src=JavScript:alert&lpar;1)></ifrAMe><br>goodbye</p>`;

            // Specify strict inline CSP policy
            let csp = ``;

            if (location.hash.match(/yes/)) {
                csp = `<meta http-equiv="Content-Security-Policy" content="default-src *">`;
            } else {
                csp = `<meta http-equiv="Content-Security-Policy" content="default-src 'none'">`;
            }

            // Clean HTML string and write into the IFRAME
            const clean = DOMPurify.sanitize(dirty);
            sanitized.srcdoc = csp + clean;
        </script>
    </body>
</html>
benbucksch commented 2 months ago

@cure53