cure53 / DOMPurify

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:
https://cure53.de/purify
Other
13.67k stars 698 forks source link

DOMPurify stripping valid html in xhtml mode #865

Closed kjeyakanthan closed 10 months ago

kjeyakanthan commented 10 months ago

This issue proposes a bug which seems to be stripping valid html after parsing a tag containing  

Background & Context

We are trying to sanitize user input from a KendoUI Editor component which outputs in XHTML which is later used in a XSL document. The config used is { PARSER_MEDIA_TYPE: 'application/xhtml+xml' }

Bug

Input

<p>abcd</p><p>abcd&nbsp;defg</p><p>efgh</p>

Given output

<p>abcd</p><p></p>

Expected output

<p>abcd</p><p>abcd&nbsp;defg</p><p>efgh</p>

thesunlover commented 10 months ago

Could you please share the version number?

kjeyakanthan commented 10 months ago

Its 3.0.5. I will try with 3.0.6 as well

cure53 commented 10 months ago

Hmmm, upon having a closer look, this should either throw an error (undefined entity, Firefox) or result in different output (Chrome).

What speaks against just using HTML? The XHTML is not really valid - hence what you get back is mangled.

kjeyakanthan commented 10 months ago

Due to the fact the output is used as parameter in an XSLT transform, the data has to be in XML format (XHTML).

cure53 commented 10 months ago

I see, however, the data you shared in the original post is not using valid XML. So, this is kind of a bit contradictory :)

At the end of the day, it's the browser or JSDOM transforming what comes in before we even start sanitizing, so I believe it's not our bug or much we can do here.