apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.68k stars 349 forks source link

sanitize-html will not allow <br> tag from react-quill #665

Closed VasileiosZisis closed 1 week ago

VasileiosZisis commented 1 week ago

In my react app I am using react-quill for the text editor and on the backend sanitize-html with JOI for the validation. In the text editor leaving empty lines results to this: <p><br></p>. The module is not allowing the br tag.

I have tried

const clean = sanitizeHtml(value, {
  allowedTags: sanitizeHtml.defaults.allowedTags.concat([ 'br' ])
});

and even

allowedTags: false,
allowedAttributes: false

My code is

const extension = (joi) => ({
  type: 'string',
  base: joi.string(),
  messages: {
    'string.escapeHTML': '{{#label}} must not include HTML tags',
  },
  rules: {
    escapeHTML: {
      validate(value, helpers) {
        const clean = sanitizeHtml(value, {
          allowedClasses: {
            '*': [
              'ql-align-right',
              'ql-align-center',
              'ql-align-justify',
              'ql-code-block',
              'ql-code-block-container',
              'ql-syntax',
              'ql-direction-rtl',
              'ql-font-serif',
              'ql-font-monospace',
              'ql-formula',
              'ql-indent-1',
              'ql-indent-2',
              'ql-indent-3',
              'ql-indent-4',
              'ql-indent-5',
              'ql-indent-6',
              'ql-indent-7',
              'ql-indent-8',
              'ql-size-small',
              'ql-size-large',
              'ql-size-huge',
            ],
          },
        });
        if (clean != value)
          return helpers.error('string.escapeHTML', { value });
        return clean;
      },
    },
  },
});
BoDonkey commented 1 week ago

If I get your question correctly, you believe that sanitize-html is stripping the break tag from <p><br></p>? Is it leaving behind <p></p> or...? I'm thinking something to do with either nesting or "empty" tags. The br tag is already allowed by default.

VasileiosZisis commented 1 week ago

It returns an error, the value is never passed. Quill places everything inside <p></p>. I can for example have <p><em></em></p> with a space inside, no strings and the value will pass and will be in the html. I only get the error with the br tag. The reason I know the value is <p><br></p>, is because I can see it printed in the console.

BoDonkey commented 1 week ago

What is the error?

VasileiosZisis commented 1 week ago

failed custom validation because \n

boutell commented 1 week ago

That's not something sanitize-html ever prints, so I suspect it is coming from elsewhere in your code.

On Sun, Jun 23, 2024 at 2:32 PM Vasilis @.***> wrote:

failed custom validation because \n

— Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/665#issuecomment-2185263799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27NM6RJQHBM7PXXEEWTZI4IDXAVCNFSM6AAAAABJYFC32CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVGI3DGNZZHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

VasileiosZisis commented 1 week ago

It is coming from JOI and the error appears only when I include sanitize-html. Without it everything is working fine.

boutell commented 1 week ago

The best way to move this forward would be for you to create a failing test case in sanitize-html. See test.js for examples.

VasileiosZisis commented 1 week ago

So I figured out what is happening and I get the error. From the request I get the value <p><br></p> and after the sanitization the value changes to <p><br /></p> and that causes the error, since value !== clean. Without the sanitization the value does not change. So seems it is still a sanitize-html issue. The only fix I can think of now regarding my code, is do not return with the error and let the user input print.

boutell commented 1 week ago

Both representations are valid. Since htmlparser2 doesn't provide sanitize-html with a complete description of whether the source tag used / or not, we can't really guarantee the output is identical to the input, only that it is sanitized according to the given rules. Also, even if that particular thing was preserved, sanitize-html might not preserve exactly the way whitespace was or wasn't used, etc. Doing so is out of scope for this module, so this issue will be closed.

However you might be able to compare representations for "equivalence" successfully by passing both the original code and the output of sanitize-html into cheerio and asking cheerio to output markup for each, then comparing thqat markup.