cure53 / DOMPurify

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:
https://cure53.de/purify
Other
13.61k stars 695 forks source link

Question about sanitizing HTML content with WHOLE_DOCUMENT option #905

Closed agnijalam closed 7 months ago

agnijalam commented 7 months ago

Issue Description:

I am currently using DOMPurify to sanitize HTML content, specifically with the WHOLE_DOCUMENT option set to true. My goal is to maintain the same input while ensuring that cross-site scripting is sanitized. However, I have encountered an issue with the output when the input includes certain HTML elements.

Steps to Reproduce:

  1. Input the following HTML content:
    <!--welcome notes -->
    <table style="font-family:verdana" font-size: 8px><b>Welcome!!!!</b> </table>
  2. Use the following code to sanitize the input: const clean = DOMPurify.sanitize(dirty, { WHOLE_DOCUMENT: true });

Current Output:

` table style="font-family:verdana" font-size: 8px>Welcome!!!!

Expected Output:

I expect the output to be sanitized and maintain the same input structure, ensuring that cross-site scripting is properly handled. ` Thank you for your assistance in resolving this matter.

cure53 commented 7 months ago

Hey there :slightly_smiling_face: The HTML you are submitting for sanitization is invalid and DOMPurify turns it into valid HTML as it uses the browser's DOM internally.

A <table> cannot legally contain a <b> element.

So, technically, it's the browser (or server-side DOM) that transforms the HTML into no longer being invalid.

agnijalam commented 7 months ago

@cure53 Thank you so much for the quick response. I completely agree with you. I have a few queries. I would like to know if there is any way to maintain the same/invalid HTML as it is and sanitize only for cross-site scripting?. Additionally, is there any way to keep the comments section as it is, at least?

cure53 commented 7 months ago

would like to know if there is any way to maintain the same/invalid HTML as it is and sanitize only for cross-site scripting?

I think you can do this by pretending it's XML or HTML, then the browser is more tolerant. You can use the config options to do that, i.e. by defining a different NAMESPACE or PARSER_MEDIA_TYPE.

See here: https://github.com/cure53/DOMPurify#control-our-allow-lists-and-block-lists

Additionally, is there any way to keep the comments section as it is, at least?

Yup, you can allow the #comment element using the ADD_TAGS config.