apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.68k stars 349 forks source link

using sanitize-html outside of npm #112

Closed mikeblum closed 8 years ago

mikeblum commented 8 years ago

I'm working on a branch of sanitize-html for loading this module without npm.

https://github.com/mikeblum/sanitize-html/tree/npm-run-build

Running npm run build creates a dist directory and addes index.js and a minified copy of index.js named sanitize-html.js and sanitize-html.min.js respectively.

From what I see in index.js we need to load the following dependencies into the built up Js file:

var htmlparser = require('htmlparser2');
var extend = require('xtend');
var quoteRegexp = require('regexp-quote');

Are there any other npm dependencies that we need to load into the built-up JS file for use outside of npm?

boutell commented 8 years ago

Please check out package.json and make sure you have what's there.

Using this module on the browser side normally doe snot make sense, FYI. You can't trust anything that comes from the browser anyway, which is normally the goal of using the sanitizer.

On Wed, Jun 15, 2016 at 1:57 PM, Michael Blum notifications@github.com wrote:

I'm working on a branch of sanitize-html for loading this module without npm.

https://github.com/mikeblum/sanitize-html/tree/npm-run-build

Running npm run build creates a dist directory and addes index.js and a minified copy of index.js named sanitize-html.js and sanitize-html.min.js respectively.

Are there any other npm dependencies that we need to load into the built-up JS file for use outside of npm?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/punkave/sanitize-html/issues/112, or mute the thread https://github.com/notifications/unsubscribe/AAB9fZhEFhDUqIsUol__xTXCmGncwM5Pks5qMDzxgaJpZM4I2oFC .

THOMAS BOUTELL, SUPPORT LEAD P'UNK AVENUE | (215) 755-1330 | punkave.com

mikeblum commented 8 years ago

Agreed. My use case is for selectively escaping HTML that gets passed to Handlebars - all on the client side.

Given the three cases:

  1. <strong>hello world</strong>
  2. <script>alert('hello world')</script>
  3. <img src=x onerror=alert('error') />

I want to use sanitize-html to still render case # 1 but escape cases # 2 and 3.

Unfortunately Handlebars don't support selective escaping:

Handlebars.SafeString();

and

Handlebars.Utils.escapeExpression();

escape all html strings, including case # 1.

Is this a valid use case?

boutell commented 8 years ago

Is this HTML really being born on the client side? There's no point where it gets stored to the server first and could be sanitized once rather than doing slow crunching on every page render in the browser?

On Wed, Jun 15, 2016 at 2:14 PM, Michael Blum notifications@github.com wrote:

Agreed. My use case is for selectively escaping HTML that gets passed to Handlebars - all on the client side.

Given the three cases:

  1. hello world
  2. <img src=x onerror=alert('error') />

I want to use sanitize-html to still render case # 1 but escape cases # 2 and 3.

Unfortunately Handlebars don't support selective escaping:

Handlebars.SafeString();

and

Handlebars.Utils.escapeExpression();

escape all html strings, including case # 1.

Is this a valid use case?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/punkave/sanitize-html/issues/112#issuecomment-226273266, or mute the thread https://github.com/notifications/unsubscribe/AAB9fZyH1J2sIXVoRtHt8RBNyDwZWdApks5qMEEUgaJpZM4I2oFC .

THOMAS BOUTELL, SUPPORT LEAD P'UNK AVENUE | (215) 755-1330 | punkave.com

mikeblum commented 8 years ago

That's a good point. I have a backing Java server that acts as the backend. On there we use org.apache.commons.lang3.StringEscapeUtils but this does the same cudgel escaping that Handlebars does in the frontend. I imagine there's a Java || {{ insert backend lang here }} library that does it. Is there a compelling reason to keep sanitize-html as an npm-only module?

boutell commented 8 years ago

npm works browser side, via browserify; why maintain two things?

However sanitize-html has a bunch of dependencies which, while fast and good at their jobs, apparently bloat it when people try to make browser-side builds of it and size starts to matter.

On Wed, Jun 15, 2016 at 2:53 PM, Michael Blum notifications@github.com wrote:

That's a good point. I have a backing Java server that acts as the backend. On there we use org.apache.commons.lang3.StringEscapeUtils but this does the same cudgel escaping that Handlebars does in the frontend. I imagine there's a Java || {{ insert backend lang here }} library that does it. Is there a compelling reason to keep sanitize-html as an npm-only module?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/punkave/sanitize-html/issues/112#issuecomment-226284786, or mute the thread https://github.com/notifications/unsubscribe/AAB9fapmoHEB67Uk0nfWCdA0VUgNgdSFks5qMEpHgaJpZM4I2oFC .

THOMAS BOUTELL, SUPPORT LEAD P'UNK AVENUE | (215) 755-1330 | punkave.com

mikeblum commented 8 years ago

I use browserify to create the DIST files:

"scripts": {
    "build": "browserify index.js > dist/sanitize-html.js --standalone 'sanitizeHtml'",
    "minify": "npm run build && uglifyjs dist/sanitize-html.js > dist/sanitize-html.min.js",
    "test": "mocha test/test.js",
    "prebuild": "npm run test && rm -rf dist && mkdir dist"
  },

That builds the two files that weight in at:

258K Jun 15 14:04 sanitize-html.js 170K Jun 15 14:04 sanitize-html.min.js

respectively.

mikeblum commented 8 years ago

Here's a demo of sanitize-html working in the front end:

demo.html

<html>
    <body>
        <script type="text/javascript"  src="dist/sanitize-html.js"></script>
        <script type="text/javascript" src="demo.js"></script>
    </body>
</html>

demo.js

var html = "<strong>hello world</strong>";
console.log(sanitizeHtml(html));
console.log(sanitizeHtml("<img src=x onerror=alert('img') />"));
console.log(sanitizeHtml("console.log('hello world')"));
console.log(sanitizeHtml("<script>alert('hello world')</script>"));

Using browserify seems to have worked out nicely with no code changes to index.js.

boutell commented 8 years ago

Neat! Glad it works in a second environment.

On Wed, Jun 15, 2016 at 4:33 PM, Michael Blum notifications@github.com wrote:

Here's a demo of sanitize-html working in the front end:

demo.html

demo.js

var html = "hello world";console.log(sanitizeHtml(html));console.log(sanitizeHtml("<img src=x onerror=alert('img') />"));console.log(sanitizeHtml("console.log('hello world')"));console.log(sanitizeHtml(""));

Using browserify seems to have worked out nicely with no code changes to index.js.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/punkave/sanitize-html/issues/112#issuecomment-226311161, or mute the thread https://github.com/notifications/unsubscribe/AAB9fTR5dirv-mAT8qsvsVMwlOgsLhRlks5qMGGBgaJpZM4I2oFC .

THOMAS BOUTELL, SUPPORT LEAD P'UNK AVENUE | (215) 755-1330 | punkave.com

mikeblum commented 8 years ago

Thank you for the quick feedback. Since this is merged I'll go ahead and close.