Closed callumlocke closed 4 years ago
Not at the moment. An interesting idea. The downside would be that you'd be accepting tags that are definitely not valid HTML5.
On Mon, Feb 15, 2016 at 11:25 AM, Callum Locke notifications@github.com wrote:
I can't see an easy way to strip out only script elements, for example, without making a huge array of every other element name in the world. Is there a way to do this?
— Reply to this email directly or view it on GitHub https://github.com/punkave/sanitize-html/issues/89.
THOMAS BOUTELL, DEV & OPS P'UNK AVENUE | (215) 755-1330 | punkave.com
Think about this though... in your desire to take a permissive approach would you want to loosen up about allowedAttributes too? Because there are many, many event handler attributes people could stuff javascript into. I think it's probably worth being specific about everything. The list of valid HTML5 tags isn't that long.
On Mon, Feb 15, 2016 at 11:33 AM, Tom Boutell tom@punkave.com wrote:
Not at the moment. An interesting idea. The downside would be that you'd be accepting tags that are definitely not valid HTML5.
On Mon, Feb 15, 2016 at 11:25 AM, Callum Locke notifications@github.com wrote:
I can't see an easy way to strip out only script elements, for example, without making a huge array of every other element name in the world. Is there a way to do this?
— Reply to this email directly or view it on GitHub https://github.com/punkave/sanitize-html/issues/89.
THOMAS BOUTELL, DEV & OPS P'UNK AVENUE | (215) 755-1330 | punkave.com
THOMAS BOUTELL, DEV & OPS P'UNK AVENUE | (215) 755-1330 | punkave.com
Just for future reference, here is a link to list of tags. I'm not sure it's perfect, use at your own risk : )
https://gist.github.com/olscore/2c6a176c6f78a6dde9eb#file-all-html-tags-list-txt
great project!
Hi @boutell,
I would appreciate something like a blacklisted tags option. AMP (Accelerated Mobile Pages) is growing and becoming more popular, but AMP HTML prohibits the usage of certain tags only. So instead of writing down all the allowed tags, it would be totally fine to just disallow a few. I think, sanitize-html
would be a great option to work with to make HTML ready for AMP!
Cheers,
Aileen
Couldn't you use the filter function for this?
var sanitizeHtml = require('sanitize-html')
var blacklist = {
'script': true,
'style': true
};
var clean = sanitizeHtml(
'<p>This is <script>alert("woot")</script>Linux</p>', {
allowedTags: false, //allow ALL tags..
exclusiveFilter: function(frame) {
//...except those in the blacklist
return blacklist[frame.tag];
}
});
console.log( clean ) // <p>This is Linux</p>
If you allow all but <script>
and <style>
, what happens when an attacker finds a new way to attack with an existing obscure but widely supported element?
Off the top of my head:
<plaintext>
to vandalize the page<base>
with *[href] to make relative links point to an attacker controlled originAnd elements occasionally get new capabilities
<meta>
tends to get new capabilities as new HTTP headers are handled specially by the browser<link>
tends to get new capabilities. <link rel=import>
and <link rel=script>
may soon load HTML and JavaScript respectively into the current document's originThe draft webcomponents spec means that the set of elements that have special semantics is potentially infinite. If a page that embeds sanitized HTML registers, for example, an unhardened markdown element for its own use, an attacker now has an easy way to get a payload to that soft target.
Just don't blacklist.
I agree, it's too dangerous, not a good practice. Learn what you want to accept and accept that.
fwiw this feature would be helpful for our use-case
we are not concerned about security, but more about sanitizing invalid html and this project has helped us with that. We use
sanitizeHtml(html, {
allowedAttributes: false,
allowedTags: false
})
however now we're sending html to feed services and need to strip out certain tags which the feeds don't support - for instance facebook instant articles don't support <svg>. It'd be nice to then blacklist this tag.
To future-proof yourself against security risks you should create a comprehensive whitelist of the tags you want to allow. The total set of HTML tags isn't that large, and the risks of tags you're unfamiliar with may be higher than expected.
On Mon, Jan 25, 2021 at 12:52 PM olsonpm notifications@github.com wrote:
fwiw this feature would be helpful for our use-case
we are not concerned about security, but more about sanitizing invalid html and this project has helped us with that. We use
sanitizeHtml(html, { allowedAttributes: false, allowedTags: false})
however now we're sending html to feed services and need to strip out certain tags which the feeds don't support - for instance facebook instant articles don't support . It'd be nice to then blacklist this tag.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/89#issuecomment-766997879, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27PZJ775RDINKPHGZG3S3WVVHANCNFSM4B3J45UA .
--
THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his
As I said, we are not concerned about security. However it seems as though this library prioritizes it, which is fine. We can achieve the blacklisting we want differently.
I can't see an easy way to strip out only
script
elements, for example, without making a huge array of every other element name in the world. Is there a way to do this?