bevacqua / insane

:pouting_cat: Lean and configurable whitelist-oriented HTML sanitizer
https://ponyfoo.com
MIT License
449 stars 22 forks source link

Consider `createHTMLDocument` instead of custom parser #1

Closed jasonkarns closed 9 years ago

jasonkarns commented 9 years ago

insane has a great API. Though these days (if <=IE8 can be ignored) I would humbly submit there are better possible implementations.

The browser is already good at parsing HTML. Why not scrap the homegrown parser and just let the browser do what it's good at?

var doc = document.implementation.createHTMLDocument();
doc.body.innerHTML = inputHTMLString;

Boom. Now doc.body is your parsed dom.

And sanitizer is really just a mechanism to query the dom. Which browsers can also do pretty well:

var nodes = doc.querySelectorAll(elementsToDrop.join(','));

All in all, insane could quickly be implemented thusly:

function insane (html, options, strict) {
  var doc = document.implementation.createHTMLDocument();
  doc.body.innerHTML = html;

  doc.querySelectorAll(":not(" + options.allowedTags.join(',') + ")").each(function(node){
    node.parentNode.removeChild(node);
  });

  return doc.body.innerHTML;
}

Granted, this implementation doesn't meet the full api, but with some minor changes, it could very well do so. And would come in significantly smaller :)

bevacqua commented 9 years ago

IE < 9 shouldn't be ignored

jasonkarns commented 9 years ago

IE < 9 shouldn't be ignored

insane already doesn't support IE < 9:

reduce, forEach, Object.keys

https://github.com/bevacqua/insane/blob/a62736ca29655fdc7098d6df2fae02aa50e00d0d/sanitizer.js#L39 https://github.com/bevacqua/insane/blob/0792aa70eab1903656e74687dd740d5a694cd990/toMap.js#L4

bevacqua commented 9 years ago

Those can easily be polyfilled

jasonkarns commented 9 years ago

Indeed. And I don't mean any disrespect. But it would be awesome to have the same API for insane but whittled down from a 2kb dep to ~200 bytes.

Judging from your readme, this lib was created as a replacement for sanitize-html, but focusing on a smaller footprint. Given there is potential for 90% size reduction for the same api/functionality, it would be "insane not to use this" :)

Just bringing up the potential here. Your lib, obviously your choice.

bevacqua commented 9 years ago

I'd rather keep broader support over a 90% reduction on a measly 2kb size. (It's also probably slower at runtime to go through the DOM). Thanks for the suggestion though.