cure53 / DOMPurify

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:
https://cure53.de/purify
Other
13.67k stars 698 forks source link

Question to understand how to remove an attribute but keep the tag #862

Closed LeGrosSancho closed 11 months ago

LeGrosSancho commented 11 months ago

This issue proposes a question

Background & Context

Hello, I would like to clean some content which looks like this:

<div is="something">
    Some content
</div>

I tried to look at the documentation, but I'm not sure to understand how to get this

<div>
    Some content
</div>

From what I've seen, I could easily remove the tag having the is attribute using FORBID_ATTR or authorize the is attribute in ADD_ATTR. However, I would like to keep the content, but to remove the attribute at the same time. Am I missing something?

Today, I have the following configuration:

{
    ALLOWED_URI_REGEXP: {},
    ADD_TAGS: [
        'base',
        // + some custom tags
    ],
    ADD_ATTR: [
        'target',
        // + some custom attributes
    ],
    FORBID_TAGS: [
        'input',
        'form'
    ],
    FORBID_ATTR: {},
    USE_PROFILES: {
        'html': true
    },
    'WHOLE_DOCUMENT': true,
    'RETURN_DOM': true
}

Could someone help me please? πŸ˜… Sorry for the dumb question and thanks in advance for your time πŸ™

cure53 commented 11 months ago

Well, the "is" attribute is a strange beast and we have some custom code to be able to handle it securely:

https://github.com/cure53/DOMPurify/blob/main/src/purify.js#L813

See this for example:

// is attribute
document.write('<body><div is="foobar">TEST</div></body>');
document.getElementsByTagName('div')[0].removeAttribute('is');
document.getElementsByTagName('div')[0].outerHTML;

versus this:

// bla attribute
document.write('<body><div bla="foobar">TEST</div></body>');
document.getElementsByTagName('div')[0].removeAttribute('bla');
document.getElementsByTagName('div')[0].outerHTML;

One attributes gets removed as expected, the other one stays. Because of that, we at least remove the content to avoid possible attacks, but the whole attribute... it's complicated :smile:

cure53 commented 11 months ago

So, bottom line - I don't think we can do much here without overriding browser behavior and that would lead to misery and doom.

Closing this one for now, nothing that can be done from our end. Please reopen if I am overlooking something.