apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.79k stars 353 forks source link

styles are always removed in browser #547

Closed Julian-B90 closed 1 year ago

Julian-B90 commented 2 years ago

To Reproduce

It always happens in browserland. See this simple codesandbox:

https://codesandbox.io/s/festive-kirch-d1geou?file=/src/index.js

Original message is below but note this is not React-specific. -Tom

Original message

Step by step instructions to reproduce the behavior:

  1. Create React App React 17
  2. call this in first page
    console.log(
    'html sanitizeHtml',
    sanitizeHtml('<p style="color:#c0392b">test</p>', {
    allowedTags: sanitizeHtml.defaults.allowedTags.concat([
    'iframe',
    'img',
    'del',
    'ins',
    ]),
    allowedStyles: {
    p: {
    // Match HEX and RGB
    color: [
    /^.*$/i,
    /^rgb\(\s*(\d{1,3})\s*,\s*(\d{1,3})\s*,\s*(\d{1,3})\s*\)$/,
    ],
    },
    },
    allowedAttributes: {
    '*': ['style'],
    },
    })
    );

Expected behavior

<p style="color:#c0392b">test</p>

What i get

<p>test</p>

Describe the bug

the style attribute was removed

Details

"dependencies": {
    "@emotion/react": "^11.4.1",
    "@emotion/styled": "^11.6.0",
    "@mui/icons-material": "^5.4.1",
    "@mui/lab": "^5.0.0-alpha.71",
    "@mui/material": "^5.4.1",
    "@mui/styled-engine-sc": "^5.4.1",
    "@mui/x-data-grid": "^5.5.0",
    "date-fns": "^2.28.0",
    "froala-editor": "^4.0.9",
    "i18next": "^21.6.11",
    "react": "^17.0.2",
    "react-beautiful-dnd": "^13.1.0",
    "react-device-detect": "^2.1.2",
    "react-dom": "^17.0.2",
    "react-fast-compare": "^3.2.0",
    "react-froala-wysiwyg": "^4.0.9",
    "react-hook-form": "^7.26.1",
    "react-i18next": "^11.15.4",
    "react-router-dom": "^5.3.0",
    "react-router-hash-link": "^2.4.3",
    "sanitize-html": "^2.7.0",
    "sort-json": "^2.0.0"
  },
boutell commented 2 years ago

Interesting issue. We don't use sanitize-html in the browser ourselves and there's no indication of this problem server-side, so it could be a while before we look at it in-house.

So what I'd suggest is that you set up the simplest possible webpack project that imports sanitize-html and tests this out in vanilla JS browser-side, as a starting point and to make sure you have the latest of everything.

And before that, I'd sanity-check that your code doesn't also fail when added to our regular unit tests and run with node on the command line, as if that fails it has nothing specifically to do with the browser.

Julian-B90 commented 2 years ago

Hey @boutell i have create a simple codesanbox https://codesandbox.io/s/festive-kirch-d1geou?file=/src/index.js

When i made a simpe js file an run it directly with node it work correctly

boutell commented 2 years ago

OK thanks for clarifying the nature of the issue. So it is not related to React, it happens all the time in the browser. That may be helpful to other interested developers who use sanitize-html on the front end and wish to help debug this problem.

kyoncy commented 2 years ago

I also experienced the same behavior. It seems to have occurred when I raised the react-scripts version from 4 to 5. I am investigating the cause as react-scripts depends on various libraries.

yauluntang commented 2 years ago

In the code, stringifyStyleAttributes would fail and go to catch block so the attribute got deleted

if (a === 'style') { try { const abstractSyntaxTree = postcssParse(name + ' {' + value + '}'); const filteredAST = filterCss(abstractSyntaxTree, options.allowedStyles);

            value = stringifyStyleAttributes(filteredAST);

            if (value.length === 0) {
              delete frame.attribs[a];
              return;
            }
          } catch (e) {
            delete frame.attribs[a];
            return;
          }
        }
yauluntang commented 2 years ago

In the code, stringifyStyleAttributes would fail and go to catch block so the attribute got deleted

if (a === 'style') { try { const abstractSyntaxTree = postcssParse(name + ' {' + value + '}'); const filteredAST = filterCss(abstractSyntaxTree, options.allowedStyles);

            value = stringifyStyleAttributes(filteredAST);

            if (value.length === 0) {
              delete frame.attribs[a];
              return;
            }
          } catch (e) {
            delete frame.attribs[a];
            return;
          }
        }

nanoid is not a function in browser

yauluntang commented 2 years ago

I proposed to add an option to patch this issue,

if (a === 'style' && options.processStyle) and for options we set processStyle = false to bypass this issue. I don't have access to create branch and pull request though

boutell commented 2 years ago

Are you saying there are features that cannot work in the browser right now? What would processStyle: false actually do? I'm a little unclear on that.

Anyone can submit a PR. The procedure is to "fork" the project to your own github (see the github page for the repo, there's a button for that). Then github will invite you to create a PR from your repo to ours when you have changes.

On Mon, May 2, 2022 at 11:42 PM Yau Lun Tang @.***> wrote:

I proposed to add an option to patch this issue,

if (a === 'style' && options.processStyle) and for options we set processStyle = false to bypass this issue. I don't have access to create branch and pull request though

— Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/547#issuecomment-1115685290, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27LGH7IDIUTX5A5CZBTVICODVANCNFSM5TQNA7IQ . You are receiving this because you were mentioned.Message ID: @.***>

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

yauluntang commented 2 years ago

Are you saying there are features that cannot work in the browser right now? What would processStyle: false actually do? I'm a little unclear on that. Anyone can submit a PR. The procedure is to "fork" the project to your own github (see the github page for the repo, there's a button for that). Then github will invite you to create a PR from your repo to ours when you have changes. On Mon, May 2, 2022 at 11:42 PM Yau Lun Tang @.> wrote: I proposed to add an option to patch this issue, if (a === 'style' && options.processStyle) and for options we set processStyle = false to bypass this issue. I don't have access to create branch and pull request though — Reply to this email directly, view it on GitHub <#547 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27LGH7IDIUTX5A5CZBTVICODVANCNFSM5TQNA7IQ . You are receiving this because you were mentioned.Message ID: @.> -- THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

Looks like there are some dependencies that rely on node only. Browser won't work with those modules. processStyle: false means that it will not try to sanitize or remove the style attribute. I downloaded and modify the code locally and tried and it can workaround the problem.

boutell commented 2 years ago

I see. So right now if the style attribute is permitted we immediately begin using the problematic feature?

Wouldn't it make more sense if the problematic feature was not even part of the build, if it can't work outside of Node?

We don't use sanitize-html in the browser here (it's usually a mistake since you can't ever trust the browser anyway; yes there are some edge cases like using it for output rather than input). So that is work is unlikely to happen on our end, but just asking what you think would be most ideal if it were possible.

On Tue, May 3, 2022 at 10:42 AM Yau Lun Tang @.***> wrote:

Are you saying there are features that cannot work in the browser right now? What would processStyle: false actually do? I'm a little unclear on that. Anyone can submit a PR. The procedure is to "fork" the project to your own github (see the github page for the repo, there's a button for that). Then github will invite you to create a PR from your repo to ours when you have changes. … <#m5323872795942038585> On Mon, May 2, 2022 at 11:42 PM Yau Lun Tang @.> wrote: I proposed to add an option to patch this issue, if (a === 'style' && options.processStyle) and for options we set processStyle = false to bypass this issue. I don't have access to create branch and pull request though — Reply to this email directly, view it on GitHub <#547 (comment) https://github.com/apostrophecms/sanitize-html/issues/547#issuecomment-1115685290>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27LGH7IDIUTX5A5CZBTVICODVANCNFSM5TQNA7IQ https://github.com/notifications/unsubscribe-auth/AAAH27LGH7IDIUTX5A5CZBTVICODVANCNFSM5TQNA7IQ . You are receiving this because you were mentioned.Message ID: @.> -- THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

Looks like there are some dependencies that rely on node only. Browser won't work with those modules. processStyle: false means that it will not try to sanitize or remove the style attribute. I downloaded and modify the code locally and tried and it can workaround the problem.

— Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/547#issuecomment-1116182717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27P4XUBJ4HPOWRUVGTTVIE3OPANCNFSM5TQNA7IQ . You are receiving this because you were mentioned.Message ID: @.***>

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

yauluntang commented 2 years ago

I see. So right now if the style attribute is permitted we immediately begin using the problematic feature? Wouldn't it make more sense if the problematic feature was not even part of the build, if it can't work outside of Node? We don't use sanitize-html in the browser here (it's usually a mistake since you can't ever trust the browser anyway; yes there are some edge cases like using it for output rather than input). So that is work is unlikely to happen on our end, but just asking what you think would be most ideal if it were possible. On Tue, May 3, 2022 at 10:42 AM Yau Lun Tang @.> wrote: Are you saying there are features that cannot work in the browser right now? What would processStyle: false actually do? I'm a little unclear on that. Anyone can submit a PR. The procedure is to "fork" the project to your own github (see the github page for the repo, there's a button for that). Then github will invite you to create a PR from your repo to ours when you have changes. … <#m5323872795942038585> On Mon, May 2, 2022 at 11:42 PM Yau Lun Tang @.> wrote: I proposed to add an option to patch this issue, if (a === 'style' && options.processStyle) and for options we set processStyle = false to bypass this issue. I don't have access to create branch and pull request though — Reply to this email directly, view it on GitHub <#547 (comment) <#547 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27LGH7IDIUTX5A5CZBTVICODVANCNFSM5TQNA7IQ https://github.com/notifications/unsubscribe-auth/AAAH27LGH7IDIUTX5A5CZBTVICODVANCNFSM5TQNA7IQ . You are receiving this because you were mentioned.Message ID: @.> -- THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his Looks like there are some dependencies that rely on node only. Browser won't work with those modules. processStyle: false means that it will not try to sanitize or remove the style attribute. I downloaded and modify the code locally and tried and it can workaround the problem. — Reply to this email directly, view it on GitHub <#547 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27P4XUBJ4HPOWRUVGTTVIE3OPANCNFSM5TQNA7IQ . You are receiving this because you were mentioned.Message ID: @.> -- THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

I think it is better to use the feature (sanitize style attribute) by default and have an optional parameter to turn it off.

I am trying to make a website that can preview html tags and output it solely in the client side and this sanitize-html module is a perfect fit for me to use in the client side.

boutell commented 2 years ago

My concern is that it sounds like the build of this module that you wind up with in the browser probably has a lot of weight for a feature that doesn't even work in the browser, no? Not that this is a use case I'm personally concerned with, but I might be if I were in your position. I do understand that you're using sanitize html for output rather than input in the browser, which can be secure.

AlexSCorey commented 2 years ago

I too am experiencing this and would like to see it fixed if we can!

edolix commented 2 years ago

I'm getting the same issue trying to sanitize HTML for output. I'll see if i have some time to dig and file a PR

shinxi commented 2 years ago

For anyone who is facing the issue in a CRA repo, one workaround is to eject from CRA and add the cjs support, e.g., https://github.com/facebook/create-react-app/pull/12021/files#diff-8e25c4f6f592c1fcfc38f0d43d62cbd68399f44f494c1b60f0cf9ccd7344d697R467, another workaround is to customize the webpack configuration via react-app-wired, e.g., https://github.com/facebook/create-react-app/pull/12021#issuecomment-1108426483. See https://github.com/facebook/create-react-app/pull/12021 for detail.

mattweberio commented 1 year ago

We have an issue when we use styles with "placeholders." The previous version only checked the "style" tag and did not validate the style tag's contents.

Example:

<p style="color: {{peachColor}};">Test</p>

Please also consider this use case for adding a disable flag.

Can you please recommend which version style values are validated? We want to return to the latest version without validation until there is a way to skip validating the style values.

slavco86 commented 1 year ago

Same here. Please, can someone look into this - kind of a blocker for using this package in the browser

NishantDesai1306 commented 1 year ago

yup, nanoid used inside postcssParse is causing issue when used in browser

image image
bertyhell commented 1 year ago

There is a thread on postcss where they recommend not using postcss in the browser: https://github.com/postcss/postcss/issues/1727

For now, i'll just switch to the https://github.com/cure53/DOMPurify package

Here is a jest test that works in jest/node but not in the browser

import sanitize from 'sanitize-html';

describe('sanitize', () => {
    it('Should not remove p styles', () => {
        const originalHtml =
            '"<p style="text-align:center">some text</p>"';
        const sanitizedHtml = sanitize(originalHtml, {
            allowedTags: [
                'p',
            ],
            allowedAttributes: {
                p: ['style'],
            },
        });
        expect(sanitizedHtml).toEqual(originalHtml);
    });
});
boutell commented 1 year ago

They're not wrong (: We have always felt that the use of sanitize-html on the browser side is a little bit crazy, since you can never trust a browser anyway, although I accept there's an edge case where you use it for output of otherwise untrusted input... even though that is unnecessary overhead... on every single render! Sanitization and servers go together like bread and jam.

bertyhell commented 1 year ago

PR to add an option to disable style parsing has been opened: https://github.com/apostrophecms/sanitize-html/pull/596

mattweberio commented 1 year ago

Hi everyone (@boutell and @bertyhell) - Thank you for the PR!!! I wanted to note that this solves the browser use case, but we also have a use case in nodeJS because our application uses "placeholders" in the styles. So the addition of the new flag makes sense for a few valid use cases.

Our example:

<p style="color: {{peachColor}};">Test</p>

Is this PR expected to be included in the next release? We'd love to upgrade.

boutell commented 1 year ago

At last count this was nearly through, I had asked for a couple documentation revisions on the PR.

On Mon, Jan 16, 2023 at 10:03 PM mattweberio @.***> wrote:

Hi everyone @.*** https://github.com/boutell and @bertyhell https://github.com/bertyhell) - Thank you for the PR!!! I wanted to note that this solves the browser use case, but we also have a use case in nodeJS because our application uses "placeholders" in the styles. So the addition of the new flag makes sense for a few valid use cases.

Our example:

Test

Is this PR expected to be included in the next release? We'd love to upgrade.

— Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/547#issuecomment-1384775475, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27OBP7JB36W6OQWV2NDWSYDXLANCNFSM5TQNA7IQ . You are receiving this because you were mentioned.Message ID: @.***>

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

bertyhell commented 1 year ago

yes, i need to tweak the docs a little, i'll put a reminder for this evening to take a look

bertyhell commented 1 year ago

I fixed the readme issues

krassowski commented 7 months ago

They're not wrong (: We have always felt that the use of sanitize-html on the browser side is a little bit crazy, since you can never trust a browser anyway, although I accept there's an edge case where you use it for output of otherwise untrusted input... even though that is unnecessary overhead... on every single render! Sanitization and servers go together like bread and jam.

@boutell just as FYI there are applications which do not have a Node server at all but use sanitize-html on the browser side, for example JupyterLab and Notebook use it because a user might open a notebook with an untrusted HTML. Thank you for your work here!