apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.69k stars 351 forks source link

when pass `{..., parser: {decodeEntities: false}}`, the `allowedStyles`'s setting isn't work #526

Closed zhuxindaba closed 2 years ago

zhuxindaba commented 2 years ago
const sanitizeHtml = require('sanitize-html');

const config = {
    allowedTags: [
        'span',
    ],
    allowedAttributes: {
        '*': [
            'style'
        ],
    },
    elfClosing: [ 'img', 'br', 'hr', 'area', 'base', 'basefont', 'input', 'link', 'meta' ],
    allowedStyles: {
        '*': {
            'background-color': [/^#(0x)?[0-9a-f]+$/i, /^rgb\(\s*(\d{1,3})\s*,\s*(\d{1,3})\s*,\s*(\d{1,3})\s*\)$/],
            'color': [/^#(0x)?[0-9a-f]+$/i, /^rgb\(\s*(\d{1,3})\s*,\s*(\d{1,3})\s*,\s*(\d{1,3})\s*\)$/],
        }
    },
    parser: {
        decodeEntities: false
    }
};
const input = `<span style="color: rgb(243, 21, 15); font-family: &quot;Microsoft YaHei&quot;, &quot;PingFang SC&quot;, &quot;Helvetica Neue&quot;, Helvetica, sans-serif; background-color: rgb(248, 228, 208);">Simple Text</span>`;
const str = sanitizeHtml(input, config);
// received:  <span>Simple Text</span>
// expected: <span style="color: rgb(243, 21, 15);background-color: rgb(248, 228, 208)">Simple Text</span>
boutell commented 2 years ago

From the docs:

"Security note: changing the parser settings can be risky. In particular, decodeEntities: false has known security concerns and a complete test suite does not exist for every possible combination of settings when used with sanitize-html. If security is your goal we recommend you use the defaults rather than changing parser, except for the lowerCaseTags option."

So there are bigger problems with using this option and that fact is already documented. This doesn't mean making progress on those issues through PRs would not be welcome, but right now it is not a recommended (or safe) practice. We would need to see a fix for the security concerns first though before it would make sense to tackle other issues with the setting.