OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

Union policy skipIfEmpty value issue #197

Closed yangbongsoo closed 4 years ago

yangbongsoo commented 4 years ago

Hello First of all, thank you for making a good open source project. I found union policy skipIfEmpty value issue.

the code is below.

@Test
public void testSkipIfEmptyUnions1() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("span")
            .allowWithoutAttributes("span")
            .toFactory();

    String spanTagString = "<span>Hi</span>";
    String resultString = beforePolicy.sanitize(spanTagString);
    assertEquals("<span>Hi</span>", resultString);

    PolicyFactory afterPolicy = beforePolicy.and(new HtmlPolicyBuilder()
            .allowElements("span")
            .disallowWithoutAttributes("span")
            .toFactory());

    resultString = afterPolicy.sanitize(spanTagString);
    // todo I think this result has problem
    assertEquals("<span>Hi</span>", resultString);
}

I think resultString value made by afterPolicy is Hi(only string). but <span>Hi</span>. Because in afterPolicy I add disallowWithoutAttributes span tag.

opposite situation

@Test
public void testSkipIfEmptyUnions2() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("span")
            .toFactory();

    String spanTagString = "<span>Hi</span>";
    String resultString = beforePolicy.sanitize(spanTagString);
    assertEquals("Hi", resultString);

    PolicyFactory afterPolicy = beforePolicy.and(new HtmlPolicyBuilder()
            .allowElements("span")
            .allowWithoutAttributes("span")
            .toFactory());

    resultString = afterPolicy.sanitize(spanTagString);
    assertEquals("<span>Hi</span>", resultString);
}

this result has no problem.

in ElementAndAttributePolicies.java(Line 88),

before

boolean combinedSkipIfEmpty;
if (HtmlPolicyBuilder.DEFAULT_SKIP_IF_EMPTY.contains(elementName)) {
  // Either policy explicitly opted out of skip if empty.
  combinedSkipIfEmpty = skipIfEmpty && p.skipIfEmpty;
} else {
  // Either policy explicitly specified skip if empty.
  combinedSkipIfEmpty = skipIfEmpty || p.skipIfEmpty;
}

How about below code?

after

if (HtmlPolicyBuilder.DEFAULT_SKIP_IF_EMPTY.contains(elementName)) {
  if (true == skipIfEmpty && false == p.skipIfEmpty) {
      combinedSkipIfEmpty = false;
  } else if (false == skipIfEmpty && true == p.skipIfEmpty) {
      combinedSkipIfEmpty = true;
  } else {
      combinedSkipIfEmpty = skipIfEmpty && p.skipIfEmpty;
  }
} else {
  // Either policy explicitly specified skip if empty.
  combinedSkipIfEmpty = skipIfEmpty || p.skipIfEmpty;
}

or

if (HtmlPolicyBuilder.DEFAULT_SKIP_IF_EMPTY.contains(elementName)) {
  combinedSkipIfEmpty = p.skipIfEmpty;
} else {
  // Either policy explicitly specified skip if empty.
  combinedSkipIfEmpty = skipIfEmpty || p.skipIfEmpty;
}

If It has any problem, I'd appreciate it if you could let me know. thanks

yangbongsoo commented 4 years ago

I thought about it again and the problem is not simple. at first, "a", "font", "img", "input", "span" tags (DEFAULT_SKIP_IF_EMPTY). in table, X means not allow/disallowWithoutAttributes setting.

beforePolicy afterPolicy result
X X beforePolicy tag remove, afterPolicy tag remove no problem
X allow beforePolicy tag remove, afterPolicy tag alive no problem
X disallow beforePolicy tag remove, afterPolicy tag remove no problem
allow X beforePolicy tag alive, afterPolicy tag alive no problem
allow allow beforePolicy tag alive, afterPolicy tag alive no problem
allow disallow beforePolicy tag alive, afterPolicy tag alive I think afterPolicy tag remove is right
disallow X beforePolicy tag remove, afterPolicy tag remove no problem
disallow allow beforePolicy tag remove, afterPolicy tag alive no problem
disallow disallow beforePolicy tag remove, afterPolicy tag remove no problem

second, other tags

beforePolicy afterPolicy result
X X beforePolicy tag alive, afterPolicy tag alive no problem
X allow beforePolicy tag alive, afterPolicy tag alive no problem
X disallow beforePolicy tag alive, afterPolicy tag remove no problem
allow X beforePolicy tag alive, afterPolicy tag alive no problem
allow allow beforePolicy tag alive, afterPolicy tag alive no problem
allow disallow beforePolicy tag alive, afterPolicy tag remove no problem
disallow X beforePolicy tag remove, afterPolicy tag remove no problem
disallow allow beforePolicy tag remove, afterPolicy tag remove I think afterPolicy tag alive is right
disallow disallow beforePolicy tag remove, afterPolicy tag remove no problem

example code is below

  @Test
  public void testSkipIfEmptyUnionSpanTag() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("span")
            .allowWithoutAttributes("span")
            .toFactory();

    String spanTagString = "<span>Hi</span>";
    String resultString = beforePolicy.sanitize(spanTagString);
    assertEquals("<span>Hi</span>", resultString);

    PolicyFactory afterPolicy = beforePolicy.and(new HtmlPolicyBuilder()
            .allowElements("span")
            .toFactory());

    resultString = afterPolicy.sanitize(spanTagString);
    assertEquals("<span>Hi</span>", resultString);
  }
mikesamuel commented 4 years ago

How about we change Set<String> skipIfEmpty into a Map<String, Skip> where

enum Skip {
    SKIP_BY_DEFAULT,
    SKIP,
    DO_NOT_SKIP,
    ;

    Skip and(Skip s) {
        return (this == s || s == SkipByDefault)
            ? this
            : (this == SkipByDefault)
            ? s
            : SKIP; // Bias towards emitting fewer tags.
    }
}

Then we can and the two maps so that an explicit decision to not skip is respected if the other map is only skipping by default.

yangbongsoo commented 4 years ago

@mikesamuel I asked for PR after I worked on it. https://github.com/OWASP/java-html-sanitizer/pull/199

It was difficult to change the naming as I changed Set to Map. The name tends to get longer because I want to know it by just looking at naming. Please point out if you don't understand naming.

I tested all 18 cases above. And they all passed. I made HtmlTagSkipType enum. and You may not understand why NONE(false) is needed. However, it is necessary to distinguish between not skip in afterPolicy and following the beforePolicy setting because there is no skip setting.