apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.84k stars 353 forks source link

Allow all except certain tags #89

Closed callumlocke closed 4 years ago

callumlocke commented 8 years ago

I can't see an easy way to strip out only script elements, for example, without making a huge array of every other element name in the world. Is there a way to do this?

boutell commented 8 years ago

Not at the moment. An interesting idea. The downside would be that you'd be accepting tags that are definitely not valid HTML5.

On Mon, Feb 15, 2016 at 11:25 AM, Callum Locke notifications@github.com wrote:

I can't see an easy way to strip out only script elements, for example, without making a huge array of every other element name in the world. Is there a way to do this?

— Reply to this email directly or view it on GitHub https://github.com/punkave/sanitize-html/issues/89.

THOMAS BOUTELL, DEV & OPS P'UNK AVENUE | (215) 755-1330 | punkave.com

boutell commented 8 years ago

Think about this though... in your desire to take a permissive approach would you want to loosen up about allowedAttributes too? Because there are many, many event handler attributes people could stuff javascript into. I think it's probably worth being specific about everything. The list of valid HTML5 tags isn't that long.

On Mon, Feb 15, 2016 at 11:33 AM, Tom Boutell tom@punkave.com wrote:

Not at the moment. An interesting idea. The downside would be that you'd be accepting tags that are definitely not valid HTML5.

On Mon, Feb 15, 2016 at 11:25 AM, Callum Locke notifications@github.com wrote:

I can't see an easy way to strip out only script elements, for example, without making a huge array of every other element name in the world. Is there a way to do this?

— Reply to this email directly or view it on GitHub https://github.com/punkave/sanitize-html/issues/89.

THOMAS BOUTELL, DEV & OPS P'UNK AVENUE | (215) 755-1330 | punkave.com

THOMAS BOUTELL, DEV & OPS P'UNK AVENUE | (215) 755-1330 | punkave.com

MagRelo commented 8 years ago

Just for future reference, here is a link to list of tags. I'm not sure it's perfect, use at your own risk : )

https://gist.github.com/olscore/2c6a176c6f78a6dde9eb#file-all-html-tags-list-txt

great project!

aileen commented 8 years ago

Hi @boutell,

I would appreciate something like a blacklisted tags option. AMP (Accelerated Mobile Pages) is growing and becoming more popular, but AMP HTML prohibits the usage of certain tags only. So instead of writing down all the allowed tags, it would be totally fine to just disallow a few. I think, sanitize-html would be a great option to work with to make HTML ready for AMP!

Cheers,

Aileen

luffs commented 8 years ago

Couldn't you use the filter function for this?

var sanitizeHtml = require('sanitize-html')
var blacklist = {
    'script': true,
    'style': true
};
var clean = sanitizeHtml(
  '<p>This is <script>alert("woot")</script>Linux</p>', {
        allowedTags: false, //allow ALL tags..
        exclusiveFilter: function(frame) {
            //...except those in the blacklist
            return blacklist[frame.tag];
        }
    });
console.log( clean ) // <p>This is Linux</p>
mikesamuel commented 7 years ago

If you allow all but <script> and <style>, what happens when an attacker finds a new way to attack with an existing obscure but widely supported element?

Off the top of my head:

And elements occasionally get new capabilities

The draft webcomponents spec means that the set of elements that have special semantics is potentially infinite. If a page that embeds sanitized HTML registers, for example, an unhardened markdown element for its own use, an attacker now has an easy way to get a payload to that soft target.

Just don't blacklist.

boutell commented 4 years ago

I agree, it's too dangerous, not a good practice. Learn what you want to accept and accept that.

olsonpm commented 3 years ago

fwiw this feature would be helpful for our use-case

we are not concerned about security, but more about sanitizing invalid html and this project has helped us with that. We use

sanitizeHtml(html, {
  allowedAttributes: false,
  allowedTags: false
})

however now we're sending html to feed services and need to strip out certain tags which the feeds don't support - for instance facebook instant articles don't support <svg>. It'd be nice to then blacklist this tag.

boutell commented 3 years ago

To future-proof yourself against security risks you should create a comprehensive whitelist of the tags you want to allow. The total set of HTML tags isn't that large, and the risks of tags you're unfamiliar with may be higher than expected.

On Mon, Jan 25, 2021 at 12:52 PM olsonpm notifications@github.com wrote:

fwiw this feature would be helpful for our use-case

we are not concerned about security, but more about sanitizing invalid html and this project has helped us with that. We use

sanitizeHtml(html, { allowedAttributes: false, allowedTags: false})

however now we're sending html to feed services and need to strip out certain tags which the feeds don't support - for instance facebook instant articles don't support . It'd be nice to then blacklist this tag.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/89#issuecomment-766997879, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27PZJ775RDINKPHGZG3S3WVVHANCNFSM4B3J45UA .

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

olsonpm commented 3 years ago

As I said, we are not concerned about security. However it seems as though this library prioritizes it, which is fine. We can achieve the blacklisting we want differently.