OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

font-family sanitization issue #232

Open jurajvalkucak opened 3 years ago

jurajvalkucak commented 3 years ago

Hi,

looks like there's issue with CSS font-family sanitization, when the input is first sanitized it adds quotes to font-families. When the sanitized content is sanitized again it removes some font-families and leaving blanks separated with commas, causing CSS font-family to be invalid.

Input to sanitize: <span style="font-family:WordVisi_MSFontService, Algerian, Algerian_EmbeddedFont, Algerian_MSFontService, sans-serif;">TEXT</span>

Sanitize input (adding quotes to font-families and lower case): <span style="font-family:&#39;wordvisi_msfontservice&#39; , &#39;algerian&#39; , &#39;algerian_embeddedfont&#39; , &#39;algerian_msfontservice&#39; , sans-serif">TEXT</span>

Sanitize again (issue removing font-families and adding commas, causing invalid font-family tag): <span style="font-family:, &#39;algerian&#39; , , , sans-serif">TEXT</span>

The issue is caused if policy is configured like below: new HtmlPolicyBuilder().allowStyling(CssSchema.DEFAULT)

Thanks, Juraj

mikesamuel commented 2 years ago

I don't think we guarantee that sanitization is idempotent, but this looks like a bug. The problem is probably somewhere in StylingPolicy and probably has to do with the underscores in the names that get removed.