apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.68k stars 349 forks source link

Title tag is not getting sanitized properly #552

Closed dhiliphvenkatesan closed 1 year ago

dhiliphvenkatesan commented 2 years ago

PLEASE NOTE: make sure the bug exists in the latest patch level of the project. For instance, if you are running a 2.x version of Apostrophe, you should use the latest in that major version to confirm the bug.

To Reproduce

Step by step instructions to reproduce the behavior:

  1. Configure sanitize-html in vue project
  2. My Configuration : { allowedTags: ['a', 'abbr', 'address', 'area', 'article', 'aside', 'audio', 'b', 'base', 'bdi', 'bdo', 'blockquote', 'body', 'br', 'button', 'canvas', 'caption', 'cite', 'code', 'col', 'colgroup', 'data', 'datalist', 'dd', 'del', 'details', 'dfn', 'dialog', 'div', 'dl', 'dt', 'em', 'fieldset', 'figcaption', 'figure', 'footer', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'head', 'header', 'hr', 'html', 'i', 'img', 'input', 'ins', 'kbd', 'label', 'legend', 'li', 'link', 'main', 'map', 'mark', 'meter', 'nav', 'noscript', 'ol', 'optgroup', 'option', 'output', 'p', 'param', 'picture', 'pre', 'progress', 'q', 'rp', 'rt', 'ruby', 's', 'samp', 'section', 'select', 'small', 'source', 'span', 'strong', 'sub', 'summary', 'sup', 'svg', 'table', 'tbody', 'td', 'template', 'textarea', 'tfoot', 'th', 'thead', 'time', 'tr', 'track', 'u', 'ul', 'var', 'video', 'wbr'], allowedAttributes: { '': ['accept', 'accept-charset', 'accesskey', 'action', 'alt', 'align', 'async', 'autocomplete', 'autofocus', 'autoplay', 'border', 'cellpadding', 'cellspacing', 'charset', 'checked', 'cite', 'class', 'cols', 'colspan', 'content', 'contenteditable', 'controls', 'coords', 'data', 'data-', 'datetime', 'default', 'defer', 'dir', 'dirname', 'disabled', 'download', 'draggable', 'enctype', 'for', 'headers', 'height', 'hidden', 'high', 'href', 'hreflang', 'http-equiv', 'id', 'ismap', 'kind', 'label', 'lang', 'list', 'loop', 'low', 'longdesc', 'max', 'maxlength', 'media', 'method', 'min', 'multiple', 'muted', 'name', 'novalidate', 'open', 'optimum', 'pattern', 'placeholder', 'poster', 'preload', 'readonly', 'rel', 'required', 'reversed', 'rows', 'rowspan', 'sandbox', 'scope', 'selected', 'shape', 'size', 'sizes', 'span', 'spellcheck', 'src', 'srcdoc', 'srclang', 'srcset', 'start', 'step', 'style', 'summary', 'tabindex', 'target', 'title', 'translate', 'type', 'usemap', 'value', 'width', 'wrap'] }, allowedSchemes: ['http', 'https', 'ftp', 'mailto', 'tel', 'data'] }
  3. I

    Expected behavior

    I didn't include iframe and title tags in my configuration but If i give this : <title title=>test Expected Result: test Actual Result: <IFRAME SRC="javascript:javascript:alert(18);"></IFRAME>test

If I insert title attribute as title=>"value" insteadof title="value" it is not getting parsed. I am using ck-editor so the escape characters such as & lt; and & gt; will get parsed as < and >. And the Iframe tag is getting executed in the ck-editor.

Describe the bug

I am using ck-editor and want to customize the sanitize-html package. In my configuration, title and iframe tags are removed and if i am giving this input => <title title=>test it is not getting sanitized correctly as it results in : <IFRAME SRC="javascript:javascript:alert(18);"></IFRAME>test which is executed in ck-editor. I want the title tag and its child not allowed tags not be removed. This only happens when we the title attribute such as<title title=>"value" >test instead of <title title="value" >test.The little change in the syntax of attribute leads to this problem.

Version of Node.js: Version 16.13.2

Server Operating System: Linux - ubuntu 20.04.03 64-bit

boutell commented 2 years ago

I'm not exactly sure what the right sanitization of this tag looks like but it probably does indeed involve removing all subtags and keeping all text.

boutell commented 2 years ago

(This is not a security issue, because the subtags themselves still get sanitized, but it is an interesting issue for sure.)

dhiliphvenkatesan commented 2 years ago

@boutell The subtags are not getting removed is the issue here. If i enclose the iFrame tag with title tag, like this <title title=>test The result should be only test because I didn't include iframe in my configuration. But instead it fails to parse the iFrame subtag and gives the result as test Instead of greater than and lesser than symbol, we are getting html escape characters. Hence we are using ck-editor, if i give iframe tag with escape characters as an input to ck-editor it will consider it as a tag and will start executing it.

boutell commented 2 years ago

Would you please submit a PR with a failing unit test that demonstrates the issue?

kedarchandrayan commented 1 year ago

htmlparser2 which is being used in sanitize-html is considering everything inside the title tag open and title tag close as text. It needs to be fixed in htmlparser2. Even the latest version of htmlparser2 (8.0.1) has this issue.

Raised issue in htmlparser2 for tracking purpose: https://github.com/fb55/htmlparser2/issues/1270

boutell commented 1 year ago

There is no security issue, htmlparser2 is escaping the subtags, which is an interesting choice but means they will not be interpreted as markup by the browser, therefore sanitized. I have never seen ckeditor treat escaped tags as live tags, but if it does in some situation then that is a security hole in ckeditor, not sanitize-html or htmlparser2.

As for why htmlparser2 gives title tags this special treatment though, I don't know the answer to that. There's an htmlIntegrationElements list in htmlparser2 which explicitly includes title for whatever reason. I'll mention it on the htmlparser2 issue in case it helps us get an answer to that.

This isn't a bug in sanitize-html though, or is it a security issue for the use of sanitize-html or something that can be fixed here, so I'm closing this ticket. Any issues with escaped tags being treated as live tags by ckeditor are ckeditor issues.

kedarchandrayan commented 1 year ago

Yup, completely agree with @boutell for closing the issue as it is outside of scope of this package.

boutell commented 1 year ago

Thanks. For what it's worth, htmlparser2 does it this way because sure enough, even the latest version of Chrome escapes any tags found inside title. So sanitize-html + htmlparser2 is just creating a cleaner expression of what browsers will do given the markup in question.