OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

Sanitizer wrongly quotes generic-named font-family CSS values #229

Open voborl00 opened 3 years ago

voborl00 commented 3 years ago

When sanitizing element with inline font-family css attribute, generic-named font family names with hyphens/dashes are wrongly quoted with single quotes (Generic family names are keywords and must not be quoted, source: https://developer.mozilla.org/en-US/docs/Web/CSS/font-family#generic-name). It seems that sanitizer's source of truth for this behaviour is CssSchema fontLiterals3 which has only some of generic-name font families enumerated.

Please provide a way to extend/configure/override this fontLiterals3 set as some non-generic-name values (like '-apple-system') should not be quoted as well.

String text = "<p style=\"font-family:sans-serif,system-ui,-apple-system;\">text</p>";
String output = new HtmlPolicyBuilder().allowStyling().allowElements("p").allowAttributes("style").onElements("p").toFactory().sanitize(text);
System.out.println(output);

<p style="font-family:sans-serif , &#39;system-ui&#39; , &#39;-apple-system&#39;">text</p>

mikesamuel commented 3 years ago

Do you happen to know which more modern CSS defines that now?

voborl00 commented 3 years ago

Thank you for your quick reply. As I can see it, this topic seems like a developing matter, so probably letting this system font (keyword) collection configurable/overridable by a user would be the best approach.

Anyway, there is some agreement valid for few last years which uses these keywords in a sans serif font definition: -apple-system BlinkMacSystemFont system-ui sans-serif

For example Bootstrap is using this in font-family definition: font-family: -apple-system, BlinkMacSystemFont, "Segoe UI",Roboto, "Helvetica Neue", Arial, "Noto Sans",sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";

When we combine it with CSS spec (I also found out, that sanitizer library puts incoming font-family (or other identification) to lower case), this should be a sufficient list of fontLiterals3: "serif", "sans-serif", "monospace", "cursive", "fantasy", "system-ui", "ui-serif", "ui-sans-serif", "ui-monospace", "ui-rounded", "math", "emoji", "fangsong", "-apple-system", "blinkmacsystemfont",

My sources are:

https://github.com/necolas/normalize.css/issues/665

https://fontsarena.com/blog/operating-systems-default-sans-serif-fonts/

https://developer.mozilla.org/en-US/docs/Web/CSS/font-family#values

mikesamuel commented 3 years ago

It's been a while since I read CSS specifications as part of my job, but starting at html.spec's reference list links to [CSSFONTS] which leads to https://drafts.csswg.org/css-fonts/#generic-font-families

Maybe we could start with the list under "2.1.3. Generic font families" which seems to agree largely with your list and extend with some of the vendor specific strings that you found.

I think the relevant code is in

https://github.com/OWASP/java-html-sanitizer/blob/020d5d0d7b8e985be32d3608612a9889135ef060/src/main/java/org/owasp/html/CssSchema.java#L390-L391

voborl00 commented 3 years ago

Hi, thank you for looking into it!

Yes, combining Generic font families from CSS specs with the vendor-specific strings would solve the issue for now and some period in future :)

larsduvaas commented 1 year ago

Some of the test shows the problem, look at

https://github.com/OWASP/java-html-sanitizer/blob/382c251b2aacae03fd2cfa76dffe01034f32b6bd/src/test/java/org/owasp/html/SanitizersTest.java#L255

I think this is the same issue as #232