apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.84k stars 353 forks source link

Default call should disallow all tags and attributes #517

Closed hgezim closed 2 years ago

hgezim commented 2 years ago

The problem to solve

Looking at a call like sanitizeHtml(userInput), it appears that all html should be sanitized and nothing allowed through (hence, sanitize-html). When the library has implicit defaults it makes me very unsure as a user as to what it'll do.

For example, dev 1 wrote:

sanitizeHtml(variant.product.title, {
    allowedTags: ['br'],
  });

Dev 2 could come later on when we disallow br tags and remove the allowedTags option altogether which would make all the default tags allowed.

The other issues since I know now the library "style", with this code:

sanitizeHtml(variant.product.title, {
    allowedTags: ['br'],
  });

It begs the question, what attributes can br have that might screw me over? Are all of br's attributes allowed? What if the user can set an attribute I don't know about and do something dangerous.

Proposed solution

Change the default API so that nothing is allowed through unless explicitly specified.

sanitizeHtml(userInput) would remove every tag. If I specify an allowed tag, it would allow that tag with 0 attributes.

These snippets would behave the same:

sanitizeHtml(variant.product.title, {
    allowedTags: [],
  });

sanitizeHtml(variant.product.title, {});

I wouldn't feel like I have to watch my back.

Default are great and those lists can be exported as well like so:

import sanitizeHtml, {defaultAllowedTags} from 'sanitize-html';

sanitizeHtml({allowedTags: defaultAllowedTags});

Alternatives

Instead of completely breaking backward compatibility, a strict export could be introduced:

import {sanitizeHtmlStrict, defaultAllowedTags} from 'sanitize-html';

sanitizeHtmlStrict(userInput);

Additional context

The only context was me doing code review and not being certain as to what will take place when allowedTags ends up changing, etc.

boutell commented 2 years ago

The default options are documented in full right in the documentation... but, this doesn't mean you're wrong. It would of course have to be a major version bump.

On Fri, Nov 19, 2021 at 2:15 AM Gezim Hoxha @.***> wrote:

The problem to solve

Looking at a call like sanitizeHtml(userInput), it appears that all html should be sanitized and nothing allowed through (hence, sanitize-html). When the library has implicit defaults it makes me very unsure as a user as to what it'll do.

For example, dev 1 wrote:

sanitizeHtml(variant.product.title, { allowedTags: ['br'], });

Dev 2 could come later on when we disallow br tags and remove the allowedTags option altogether which would make all the default tags allowed.

The other issues since I know now the library "style", with this code:

sanitizeHtml(variant.product.title, { allowedTags: ['br'], });

It begs the question, what attributes can br have that might screw me over? Are all of br's attributes allowed? What if the user can set an attribute I don't know about and do something dangerous. Proposed solution

Change the default API so that nothing is allowed through unless explicitly specified.

sanitizeHtml(userInput) would remove every tag. If I specify an allowed tag, it would allow that tag with 0 attributes.

These snippets would behave the same:

sanitizeHtml(variant.product.title, { allowedTags: [], });

sanitizeHtml(variant.product.title, {});

I wouldn't feel like I have to watch my back.

Default are great and those lists can be exported as well like so:

import sanitizeHtml, {defaultAllowedTags} from 'sanitize-html';

sanitizeHtml({allowedTags: defaultAllowedTags});

Alternatives

Instead of completely breaking backward compatibility, a strict export could be introduced:

import {sanitizeHtmlStrict, defaultAllowedTags} from 'sanitize-html';

sanitizeHtmlStrict(userInput);

Additional context

The only context was me doing code review and not being certain as to what will take place when allowedTags ends up changing, etc.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/517, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27NQ7AGAP2IABIIUPQDUMX2Q3ANCNFSM5ILOQUMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

boutell commented 2 years ago

Whether to allow some defaults as "safe" is a design decision with a lot of ins and outs. I suggest you take a look at our documented defaults and think about whether you see any scenarios where they would be harmful.

On Fri, Nov 19, 2021 at 11:27 AM Tom Boutell @.***> wrote:

The default options are documented in full right in the documentation... but, this doesn't mean you're wrong. It would of course have to be a major version bump.

On Fri, Nov 19, 2021 at 2:15 AM Gezim Hoxha @.***> wrote:

The problem to solve

Looking at a call like sanitizeHtml(userInput), it appears that all html should be sanitized and nothing allowed through (hence, sanitize-html). When the library has implicit defaults it makes me very unsure as a user as to what it'll do.

For example, dev 1 wrote:

sanitizeHtml(variant.product.title, { allowedTags: ['br'], });

Dev 2 could come later on when we disallow br tags and remove the allowedTags option altogether which would make all the default tags allowed.

The other issues since I know now the library "style", with this code:

sanitizeHtml(variant.product.title, { allowedTags: ['br'], });

It begs the question, what attributes can br have that might screw me over? Are all of br's attributes allowed? What if the user can set an attribute I don't know about and do something dangerous. Proposed solution

Change the default API so that nothing is allowed through unless explicitly specified.

sanitizeHtml(userInput) would remove every tag. If I specify an allowed tag, it would allow that tag with 0 attributes.

These snippets would behave the same:

sanitizeHtml(variant.product.title, { allowedTags: [], });

sanitizeHtml(variant.product.title, {});

I wouldn't feel like I have to watch my back.

Default are great and those lists can be exported as well like so:

import sanitizeHtml, {defaultAllowedTags} from 'sanitize-html';

sanitizeHtml({allowedTags: defaultAllowedTags});

Alternatives

Instead of completely breaking backward compatibility, a strict export could be introduced:

import {sanitizeHtmlStrict, defaultAllowedTags} from 'sanitize-html';

sanitizeHtmlStrict(userInput);

Additional context

The only context was me doing code review and not being certain as to what will take place when allowedTags ends up changing, etc.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/517, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27NQ7AGAP2IABIIUPQDUMX2Q3ANCNFSM5ILOQUMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

hgezim commented 2 years ago

For me, it's about the safety. I don't feel safe that this two lines don't produce the same results:

sanitizeHtml(variant.product.title, {
    allowedTags: [],
  });

sanitizeHtml(variant.product.title, {});

I think you could easily end up with a nasty bug if you initially have allowedTags: ['br'] then later on decide to not allow br tags either.

What do you do? The API doesn't make it clear enough that you should just remove the 'br' and keep the empty array.

boutell commented 2 years ago

I see your point, but I also think there's a lot to be said for sensible defaults for the "allow basic HTML" use case.

@abea @stuartromanek what do you think?

stuartromanek commented 2 years ago

Stepping back to the big picture, the module is called 'sanitize-html', not 'strip-html' or 'remove-html' .. it's purpose first is to return a consistent format for HTML. To that end, it's reasonable for the module to have a default set of tags/attrs/etc that it deems 'sanitary'. This would be easier to express if HTML5 had a DTD or something official to root the default definition in, but alas.

I see this more as a documentation issue. @hgezim's use case is directly referenced in the README but overall the README is verbose and frames a lot of scenarios in a less technical, more conversational way. This makes the README hard to read/scan (it takes almost 3,000 words to get the first basic usage example!).

hgezim commented 2 years ago

Great points @stuartromanek. I think an API call table as a reference would help that lists required arguments, defaults and options.

Here's an example:

CleanShot 2021-12-03 at 14 56 19@2x
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.