OWASP / owasp-java-encoder

The OWASP Java Encoder is a Java 1.5+ simple-to-use drop-in high-performance encoder class with no dependencies and little baggage. This project will help Java web developers defend against Cross Site Scripting!
https://owasp.org/www-project-java-encoder/
BSD 3-Clause "New" or "Revised" License
479 stars 110 forks source link

Combining OWASP Sanitizer and Encoder #69

Closed bmscodespace closed 1 week ago

bmscodespace commented 6 months ago

Hi,

is it possible to combine the OWASP Sanitizer and the OWASP Encoder to not remove malicious code but to encode the problematic parts from a given string, so that f.e. a script tag will do no harm and is just displayed as a text. I am asking this because I would like to deal with texts where it is not certain if they will be displayed as inner html or as "normal text".

Thank you very much for any answer ;)

melloware commented 6 months ago

I think this would be a great idea. Neither library is that large so combining them would make sense + 1

jmanico commented 6 months ago

If the content is data that you want to display exactly like a user typed it in safely, then I would use the encoder.If the content is HTML that you actually want to render that’s authored by a user then you want to use the HTML sanitizer.Does that make sense to you?--Jim @./manicodeSecure Coding EducationOn Jan 19, 2024, at 5:34 AM, bmscodespace @.> wrote: Hi, is it possible to combine the OWASP Sanitizer and the OWASP Encoder to not remove malicious code but to encode the problematic parts from a given string, so that f.e. a script tag will do no harm and is just displayed as a text. I am asking this because I would like to deal with texts where it is not certain if they will be displayed as inner html or as "normal text". Thank you very much for any answer ;)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

melloware commented 6 months ago

@jmanico it totally does I just find a lot of people use both these libs one for santizing HTML input and the others for sanitizing output before its send back to the browser like JSON data etc. I know in PrimeFaces we use both libraries.

bmscodespace commented 6 months ago

Hi,

thank you for your comments. My question imagined a scenario where we don't know if a text will be displayed as inner HTML, f.e. as formatted text with lots of p tags or b tags in it, or as an ordinary data text that was f.e. typed in safely. If I sanitize the text then this might destroy a text like f.e.

A script in HTML starts with <script> and ends with </script> .

On the other hand, if I encode every string, a HTML string which we might want to display as formatted text will then be displayed as a HTML string with possible code from an attacker in it ;).

jeremylong commented 1 week ago

Encoding must be done at the point of output. Otherwise you run into the problem of using the wrong encoding.