OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

Do not lcase element or attribute names that match SVG or MathML name… #206

Closed mikesamuel closed 4 years ago

mikesamuel commented 4 years ago

…s exactly

Currently all names are converted to lowercase which is ok when you're using it for HTML only, but if there is an SVG image nested inside the HTML it breaks. For example, when viewBox attribute is converted to viewbox the image is not displayed correctly.

This commit splits HtmlLexer.canonicalName into variants which preserve items on whitelists derived from the SVG and MathML specifications, and adjusts callers of canonicalName to use the appropriate variant.

Fixes #182

@wookie41 @zeeneir do you have an example that I can turn into a unit test?

zeeneir commented 4 years ago

Simple test for viewBox:

    PolicyFactory policyFactory = new HtmlPolicyBuilder()
            .allowElements("svg")
            .allowAttributes("viewBox").onElements("svg")
            .toFactory();
    String svg = "<svg viewBox=\"0 0 0 0\"></svg>";
    assertEquals(svg, policyFactory.sanitize(svg));
mikesamuel commented 4 years ago

@zeeneir Added tests. Let me know if this PR solves your problem.

zeeneir commented 4 years ago

Looks good to me. Thanks a lot!