apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.69k stars 351 forks source link

It is possible to add a disallowed closing tag with valid HTML as well as opening tag with invalid HTML markup. #549

Closed ghost closed 1 year ago

ghost commented 2 years ago

I found several variants of the library's incorrect behavior. In the examples below, it is possible to add any html tag (closing tag with valid HTML as well as opening tag with invalid HTML) if any tag is allowed.

Example 1:

const sanitizeHtml = require('sanitize-html');

const sanitizedString = sanitizeHtml('<b><div/', {
    allowedTags: ['b'],
});

Expected behavior

As a result of the execution I expect to see <b></b> or a empty line. However, I get <b></div>.

Example 2:

const sanitizeHtml = require('sanitize-html');
const HTMLParser = require('node-html-parser');

const sanitizedString = sanitizeHtml('<b><b<<div/', {
    allowedTags: ['b'],
});

console.log(sanitizedString) // <b></b<<div>

const unespectedDiv = HTMLParser.parse(sanitizedString).querySelector('div');

console.log(unespectedDiv);

Expected behavior

As a result of the execution I expect to see <b></b> or a empty line. However, I get <b></b<<div>. The resulting string contains a substring <div>, which is interpreted by some parsers as a valid html tag like node-html-parser (Browsers interpret it correctly).

boutell commented 2 years ago

Since browsers interpret it correctly I don't regard it as a security issue, but I agree it is not ideal output. It would make more sense for the additional <'s to get escaped or something.

boutell commented 2 years ago

Would you be interested in submitting a PR?

On Tue, May 3, 2022 at 10:09 AM Ilya @.***> wrote:

I found several variants of the library's incorrect behavior. In the examples below, it is possible to add any html tag if any tag is allowed. Example 1:

const sanitizeHtml = require('sanitize-html');

const sanitizedString = sanitizeHtml('<div/', { allowedTags: ['b'], });

Expected behavior

As a result of the execution I expect to see or a empty line. However, I get

. This example is absolutely valid for any browser because valid html markup is generated.

Example 2:

const sanitizeHtml = require('sanitize-html'); const HTMLParser = require('node-html-parser');

const sanitizedString = sanitizeHtml('<b<<div/', { allowedTags: ['b'], });

console.log(sanitizedString) // </b<

const unespectedDiv = HTMLParser.parse(sanitizedString).querySelector('div');

console.log(unespectedDiv);

Expected behavior

As a result of the execution I expect to see or a empty line. However, I get </b<

. The resulting string contains a substring

, which is interpreted by some parsers as a valid html tag like node-html-parser (Browsers interpret it correctly). — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

ghost commented 2 years ago

I completely agree with you and it is not a security risk, but in some scenarios it can lead to unexpected results. However, I'm not sure I'm the best candidate for fixing this behavior. I just wanted to share some research.