cure53 / DOMPurify

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:
https://cure53.de/purify
Other
13.77k stars 708 forks source link

HTML and BODY tags are being regardless of `ALLOWED_TAGS` settings #962

Closed secret-agent-B closed 4 months ago

secret-agent-B commented 4 months ago

This issue proposes a [bug, feature] which...

Background & Context

I have a string that needs to be sanitized. But I want to allow HTML and BODY tags if it exists. I don't want it to be added automatically if they're not in the input string. I've also tried CUSTOM_ELEMENT_HANDLING but it was still filtering out HTML and BODY tags.

Bug

HTML and BODY tags should be allowed if they're on the ALLOWED_TAGS.

Input

        // test
        const input = '<html><body><span>text<span></body></html>';
        const expected = '<html><body>text</body></html>';
        const actual = DOMPurify.sanitize(input, {
            ALLOWED_TAGS: ['html', 'body'],
            ALLOW_ARIA_ATTR: false,
            ADD_TAGS: ['html', 'body'],
            IN_PLACE: true
        });

<html><body><span>text</span></body></html>

Given output

text

Expected output

<html><body>text</body></html>

Feature

It should allow HTML and BODY tags if they're in the ALLOWED_TAGS or another settings to allow it would be nice too.

hsk-kr commented 4 months ago

The library parses the html using DOMParser.parseFromString, and it generates a html structured dom tree.

<html>
  <head>
  </head>
  <body>
    <!-- parsed html -->
  </body>
</html>

Ignoring html and body tags acts more naturally with this function as its purpose generates a DOM tree by parsing the html string.

There should be changes in the parsing logic to allow html and body tags, but I'm not so sure it is safe, and it should be done in the library. It may be harmful to the safety of the library... It's just my opinion.

You can accomplish the task by replacing the html and body tags with a string that has a suffix and then replacing them back.

  const replaceHtmlAndBody = (html:string, back=false) => {
    if (back) return html.replace(/html\$|body\$/g, (arg:string) => arg.substring(0, arg.length - 1));
    return html.replace(/html|body/g,(arg:string) => `${arg}$`);
  }

  const input = replaceHtmlAndBody('<html><body><span>text</span></body></html>');
  const expected = '<html><body><span>text</span></body></html>';
  let actual = DOMPurify.sanitize(input, {
      ADD_TAGS: ['html$', 'body$'],
  });
  actual = replaceHtmlAndBody(actual, true);

  console.log({
    input,
    actual,
    expected,
  });

input: <html$><body$><span>text</span></body$></html$>

actual: <html><body><span>text</span></body></html>

expected: <html><body><span>text</span></body></html>

cure53 commented 4 months ago

This should do the trick :slightly_smiling_face:

const clean = DOMPurify.sanitize(dirty, {FORCE_BODY: true});