apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.79k stars 353 forks source link

Opening angle followed by a letter is treated as start of a new tag #467

Closed Metavirulent closed 3 years ago

Metavirulent commented 3 years ago

To Reproduce

const sanitize = require('sanitize-html');
const text = sanitize("we all know that one<two, isn't it?", {disallowedTagsMode: "escape"})
//outputs: we all know that one

Expected behavior

Should output: we all know that one<two, isn't it?

Describe the bug

It seems that the "<two" triggers an opening tag in the parsing but since there is no closing tag, the remainder of the string gets skipped.

Details

Version of Node.js:

Server Operating System:

boutell commented 3 years ago

The degree of tolerance for bad markup depends on the htmlparser2 module upon which sanitize-html is built. You could take up the issue there, but it's not wrong to treat this as an opening tag and disregard the rest of the string. sanitize-html is designed to sanitize valid HTML, not to guess at user intent in handwritten text with some HTML markup. The most common use case for sanitize-html is cleaning up valid but unwanted tags and attributes, for instance when pasted into a rich text editor or submitted maliciously by a script.