OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

Font tag stripped when allowing old HTML formatting tags #208

Closed mcdobr closed 3 years ago

mcdobr commented 3 years ago

Hi. I'm sanitizing my users' HTML and they are using deprecated HTML4 tags to style text. In the meantime I've suggested they use inline styling.

Here's a small test showcasing my problem:

@Test
void shouldAllowFormatting() {
    // given
    String html = "<p><font face=\"arial\" color=\"#8ebf42\">Green <b>text</b>, <em>typeface</em>" +
            " changed.</font></p>";

    String expectedHtml = "<p><font>Green <b>text</b>, <em>typeface</em>" +
            " changed.</font></p>";

    // when
    String actualSanitizedHtml = Sanitizers.BLOCKS.and(Sanitizers.FORMATTING).sanitize(html);

    // then
    assertThat(actualSanitizedHtml).isEqualTo(expectedHtml);
}

Since i'm using Sanitizers.FORMATTING which allows \<font> and \<em> and \<b> and so on, I was expecting the tag to not be stripped away. I was expecting that the attributes be stripped but that I could fix by chaining my custom policy. Here's the output of the test:

org.opentest4j.AssertionFailedError: 
Expecting:
 <"<p>Green <b>text</b>, <em>typeface</em> changed.</p>">
to be equal to:
 <"<p><font>Green <b>text</b>, <em>typeface</em> changed.</font></p>">
but was not.
Expected :<p><font>Green <b>text</b>, <em>typeface</em> changed.</font></p>
Actual   :<p>Green <b>text</b>, <em>typeface</em> changed.</p>

I've tested on both 20191001.1 and on 20200713.1. Is there something I am missing?

By the way, thanks for your effort in supporting this library. It's really a pleasure to use.

jmanico commented 3 years ago

Thank you for the bug report I politely assigned this to @mikesamuel

mikesamuel commented 3 years ago

The AntiSamy test suite, which was inherited from a precursor project, shows that we should be correctly supporting <font color>.

https://github.com/OWASP/java-html-sanitizer/blob/ca406970b74abd03ec045713b3c9f53963b716ff/src/test/java/org/owasp/html/AntiSamyTest.java#L436-L484

The relevant policy code seems to be

https://github.com/OWASP/java-html-sanitizer/blob/ca406970b74abd03ec045713b3c9f53963b716ff/src/test/java/org/owasp/html/AntiSamyTest.java#L71-L84

The reason <font> is being dropped when there's nothing there is

https://github.com/OWASP/java-html-sanitizer/blob/ca406970b74abd03ec045713b3c9f53963b716ff/src/main/java/org/owasp/html/HtmlPolicyBuilder.java#L161-L169

You should be able to correct that with

https://github.com/OWASP/java-html-sanitizer/blob/ca406970b74abd03ec045713b3c9f53963b716ff/src/main/java/org/owasp/html/HtmlPolicyBuilder.java#L314-L321

mcdobr commented 3 years ago

Yeah, my issue seems to be that i was using prepackaged policies and combining them with .and(). Given the behavior you mentioned (DEFAULT_SKIP_TAG_MAP_IF_EMPTY_ATTR ) and the fact that Sanitizers.FORMATTING whitelists <font> tag but no attribute is whitelisted, the font tag isn't really whitelisted on the Sanitizer. Also given the way .and() works I would have to chain with a policy and specifically allow font again, then allow the attributes.

Thanks for the help.

jmanico commented 3 years ago

May we close this issue out?

-- Jim Manico @Manicode

On Sep 22, 2020, at 10:47 PM, Mircea Dobreanu notifications@github.com wrote:

 Yeah, my issue seems to be that i was using prepackaged policies and combining them with .and(). Given the behavior you mentioned (DEFAULT_SKIP_TAG_MAP_IF_EMPTY_ATTR ) and the fact that Sanitizers.FORMATTING whitelists tag but no attribute is whitelisted, the font tag isn't really whitelisted on the Sanitizer. Also given the way .and() works I would have to chain with a policy and specifically allow font again, then allow the attributes.

Thanks for the help.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mikesamuel commented 3 years ago

@mcdobr Hmm. I could have sworn I implemented a PolicyFactory -> Builder conversion at some point, but maybe that was planned but never implemented.

Now that we distinguish between explicit and implicit cases

policyA.and(policyB).and(
   new HTMLPolicyBuilder().allowWithoutAttributes("font").toFactory()
)

should do something sensible though you may have to allowElements("font") in the built policy for the allowWithoutAttributes to take.

mcdobr commented 3 years ago

Thanks for all the help.