User controlled markup vs programatic markup generation - Githubissues

cure53 / DOMPurify

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:

https://cure53.de/purify

Other

13.67k stars 698 forks source link

User controlled markup vs programatic markup generation #875

Closed Dorfeuheinz closed 10 months ago

Dorfeuheinz commented 10 months ago

I apologise for posting this here but I had a doubt that needed clarification.

My question is- instead of giving the user the power to edit markup, is it better to just have them use a UI to preview what theyre trying to accomplish & if theyre cool with that, add the markup in the backend for storage purposes programatically instead, once they hit enter?

As in instead of letting users encapsulate a string within 2 astrix for the string to be parsed as bold (which can lead to ambiguity) is it better to just let them use an editors UI for those things & once theyre happy with the way it looks in the editor, we purify the html & then parse it so we can replace tags with markup instead for storage purposes, as ive been advised against storing html (obviously). Then using that plaintext to render the associated html in the frontend.

What are the pros & cons of both approaches? Ive been told there are security issues with the 2nd approach? Any advice?

cure53 commented 10 months ago

Sure, but this approach works for some application and for others, it doesn't. Think for example web mailer and HTML-mails.

Dorfeuheinz commented 10 months ago

Sure, but this approach works for some application and for others, it doesn't. Think for example web mailer and HTML-mails.

Any personal suggestions? & are there any security concerns with these approaches? Which do you personally prefer?

cure53 commented 10 months ago

It really depends, there is no general answer :sweat_smile: The question is similar to "should I eat meat or only veg??" - where also no clear answer can be given because it depends on so many factors.

Dorfeuheinz commented 10 months ago

It really depends, there is no general answer 😅 The question is similar to "should I eat meat or only veg??" - where also no clear answer can be given because it depends on so many factors.

I understand. Just needed a 2nd opinion since some people wanted me to give users markup editing capabilities whereas I thought, itd be cleaner to not do that & just process & replace html tags with markups for storage so the parser remains minimal too. But I feel, im gonna go with the 1st option. Can change it later anyway. Lets see how this pans out. Since most users with a coding background would want to be able to change markup without relying too much on the UI & mouse. Lets hopre for the best

Thankyou for your help. Appreciate it

cure53 commented 10 months ago

At the end of the day, if you start with offering a UI to users where only certain kinds of HTML are editable or where templates and markdown are being used, you might be starting off with a secure default at first, but with a growing set of features eventually end up in a rabbit hole where it all becomes insecure.

Also, this approach increases complexity and the XSS issues stemming from improper markdown handling are rich in numbers.

So, it might be making sense to say "Hey, why not go with HTML right away and build it securely from the get go".

If you develop a reasonably secure strategy from the get go, assuming that user controlled HTML might be happening anyway, you might be better off. And then, a initializer helps. Keep in mind though that you always need to sanitize the output, while you can likely store the unsanitized HTML. You can also store sanitized HTML and then sanitize again when the HTML is being echoed - it depends of course on what the application does and what features are needed.

Also, check out sandboxed iframes in addition to sanitization. That might give you another tier of hardening, even in case the sanitizer gets bypassed.

cure53 commented 10 months ago

So, long story short - consider building it assuming the and being prepared for the worst case - which would be fully user controlled HTML. If it's secure enough for that, you don't have to worry too much about adding new features.

Dorfeuheinz commented 10 months ago

So, long story short - consider building it assuming the and being prepared for the worst case - which would be fully user controlled HTML. If it's secure enough for that, you don't have to worry too much about adding new features.

I shall. Thankyou for your input. Its been incredibly helpful. I appreciate it

cure53 commented 10 months ago

Glad to hear :)

Dorfeuheinz commented 10 months ago

Glad to hear :)

sorry to butt in like this just a small doubt. I decided to go with markup generation for text styling. Wanted to know something. In the end, the data thats being saved to the backend is textContent. But at the same time I need the styling & tags users apply like bold, strikethrough or any other formatting, to be displayed in real time as users click on their corresponding buttons to apply those styles. So I need those styles to be displayed but in the end only need textContent for storage purpose so I can render that markup into corresponding html for display purposes later. Do I need sanitization APIs for this particular use case? Consider something similar to discords chat editor UI. To implement something similar to that, is it required?

cure53 commented 10 months ago

I am not sure if this is the right place to ask for free architectural security advice :sweat_smile:

Generally, you need to sanitize HTML whenever it is user controlled and you want to limit the attack surface for XSS attacks and the likes. Intermediate languages like Markdown might make things safer but are also known to facilitate XSS attacks if the stars are aligned the right way.

Dorfeuheinz commented 10 months ago

I am not sure if this is the right place to ask for free architectural security advice 😅

Generally, you need to sanitize HTML whenever it is user controlled and you want to limit the attack surface for XSS attacks and the likes. Intermediate languages like Markdown might make things safer but are also known to facilitate XSS attacks if the stars are aligned the right way.

I see. Thanks again. My apologies for asking this stuff here. I had no clue where to seek help for XSS related issues. Dont know any specific place where these questions are answered. But now I get it. Thankyou for the help & apologies again