OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
834 stars 209 forks source link

Question: How to not escape characters in plain text #269

Closed kennytv closed 1 year ago

kennytv commented 2 years ago

Hi, I am using the sanitizer before throwing the input into a markdown renderer (flexmark, using the sanitizer to still allow certain html elements), which breaks markdown rendered parts by escaping = and backticks in text elements.

I'm mostly just using the pregiven policies and was wondering how I could disable escaping such characters in places using policy builders, leaving that task up to the markdown renderer after sanitizing.

    public static final PolicyFactory SANITIZER = Sanitizers.FORMATTING.and(Sanitizers.BLOCKS).and(IMAGES).and(Sanitizers.TABLES).and(Sanitizers.STYLES)
        .and(new HtmlPolicyBuilder().allowElements("details", "summary").toFactory());

Where the following plain text is escaped from

Text with backticks `as code in markdown`
![alt text](imageurl =30x30)

to

Text with backticks `as code in markdown`
![alt text](imageurl =30x30)

Due to special markdown tags that convert to iframes and such in a controlled manner, we can't easily put the html sanitization after the markdown rendering without specifically allowing all of those fine grained cases.