darylldoyle / svg-sanitizer

A PHP SVG/XML Sanitizer
GNU General Public License v2.0
454 stars 67 forks source link

[BUGFIX] Allow DOMText and implicitly DOMCdataSection #72

Closed ohader closed 2 years ago

ohader commented 2 years ago

Recent change disallowed CDATA sections, however the actual fix would have been to disallow non SVG-elements when used inline in some HTML-context.

Resolves: #70

darylldoyle commented 2 years ago

@ohader thanks for this. Do you have any concerns with leaving the escaped <img> in place? It's something that I was struggling with beforehand.

zcorpan commented 2 years ago

Escaped text is safe. If it wasn't, you'd have to remove or sanitize all text nodes.

Serializing CDATA sections as CDATA sections is not safe in HTML since it's sometimes parsed as a bogus comment.

ohader commented 2 years ago

@zcorpan Thanks for pointing that out. I just discovered your remarks in https://github.com/whatwg/html/issues/4016 from 2018 as well... 😳

So basically CDATA content would have to be sanitized from HTML tags (removing non-SVG tags), e.g. by

CDATA followed by element not in HTML namespace

<defs>
  <style type="text/css"><![CDATA[
    circle { fill: gold; }
  ]]></style>
</defs>

CDATA followed by an element being in HTML namespace

<p/><![CDATA[ ><img src onerror=alert(3)> just-harmless-text]]>
ohader commented 2 years ago

CDATA sections are now converted to text nodes, containing encoded content only... and basically that's it. This approach is much simpler than trying to infer context and parser state, just for the sake of keeping those <![CDATA[ literals.

This text node conversion now is more explicit.

zcorpan commented 2 years ago

Yeah, always replacing CDATA section nodes with text nodes is what I would suggest.

ohader commented 2 years ago

@darylldoyle Can you merge this PR please and tag a new 0.15.4 version? Thanks in advance!

ohader commented 2 years ago

@darylldoyle Awesome! Thanks a bunch! 👍