OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
854 stars 214 forks source link

How to skip sanitization of the emojis #177

Open PriyankPurwar opened 5 years ago

PriyankPurwar commented 5 years ago

Is there a way to skip the sanitization of emojis.

This was the old issue (https://github.com/OWASP/java-html-sanitizer/issues/143 )but I don't see any reasonable conclusion

mikesamuel commented 5 years ago

Same question then as https://github.com/OWASP/java-html-sanitizer/issues/143#issuecomment-392858011

How is this a problem?

The HTML 😞 should be equivalent to 😞.

alecl commented 5 years ago

I'll second this issue. Some apps save data that's used across multiple output channels (html website AND a mobile app for example) so Unicode would work fine in both but HTML entities would not work in an app using native controls and not a webview.

mikesamuel commented 5 years ago

@alecl, this library outputs HTML. How are its output conventions relevant to apps that use native controls?

alecl commented 5 years ago

I think of it as a gold standard for sanitizing HTML not necessarily transforming existing data even if to HTML compatible formats.

It's not that esoteric a use case to have one database entry for source data for display in multiple channels (web, mobile app, e-mail even).

Another option for us would be to use an HTML stripping tool but those are often naive removing brackets with impunity or doing other odd things. This tool is a much smarter implementation.