OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

about PolicyFactory.and(PolicyFactory f) job #204

Open yangbongsoo opened 4 years ago

yangbongsoo commented 4 years ago

I organized the results made through PolicyFactory.and(PolicyFactory f)

What I want to know exactly : with afterPolicy = beforePolicy.and(newPolicy) how has the afterPolicy changed from beforePolicy?

1. allowWithoutAttributes disallowWithoutAttributes

we've investigated in #197

2. allowElements disallowElements

beforePolicy afterPolicy result
X X nothing allowed no problem
X allow beforePolicy tag all remove, afterPolicy tag alive(only added tags) no problem
X disallow beforePolicy tag all remove, afterPolicy tag all remove no problem
allow X beforePolicy tag alive(only added tags), afterPolicy tag alive(only added tags) no problem
allow allow only allowed elements are allowed. no problem
allow disallow If I allow p tag in beforePolicy and disallow p tag in newPolicy, result afterPolicy allow p tag I think that disallowing p tag in afterPolicy is right
  // beforePolicy : allow
  // afterPolicy : disallow
  @Test
  public void testAllowElements6() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("p")
            .toFactory();

    String input = "<p>pText</p>";
    assertEquals("<p>pText</p>", beforePolicy.sanitize(input));

    PolicyFactory newPolicy = new HtmlPolicyBuilder()
            .disallowElements("p")
            .toFactory();

    PolicyFactory afterPolicy = beforePolicy.and(newPolicy);
    assertEquals("<p>pText</p>", afterPolicy.sanitize(input));
  }
beforePolicy afterPolicy result
disallow X beforePolicy tag remove, afterPolicy tag remove no problem
disallow allow beforePolicy tag remove, afterPolicy tag alive(only added tags) no problem
disallow disallow beforePolicy tag remove, afterPolicy tag remove no problem

3. allowUrlProtocols disallowUrlProtocols

beforePolicy afterPolicy result
X X nothing allowed protocols no problem
X allow nothing allowed protocols I think that afterPolicy allowing http protocol is right
  // before : X
  // after : allow
  @Test
  public void testAllowUrlProtocol2() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .toFactory();

    String input = "<a href='http://example.com'>Hi</a>";
    assertEquals("Hi", beforePolicy.sanitize(input));

    PolicyFactory newPolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .allowUrlProtocols("http")
            .toFactory();

    PolicyFactory afterPolicy = beforePolicy.and(newPolicy);

    assertEquals("Hi", afterPolicy.sanitize(input));
  }
beforePolicy afterPolicy result
X disallow beforePolicy tag all remove, afterPolicy tag all remove no problem
allow X If I allow http protocol in beforePolicy and do nothing in newPolicy, afterPolicy result disallow http protocol In order to be consistent with the previous strategy, I think that afterPolicy allow http protocol is right
  // before : allow
  // after : X
  @Test
  public void testAllowUrlProtocol4() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .allowUrlProtocols("http")
            .toFactory();

    String input = "<a href='http://example.com'>Hi</a>";
    assertEquals("<a href=\"http://example.com\">Hi</a>", beforePolicy.sanitize(input));

    PolicyFactory newPolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .toFactory();

    PolicyFactory afterPolicy = beforePolicy.and(newPolicy);

    assertEquals("Hi", afterPolicy.sanitize(input));
}
beforePolicy afterPolicy result
allow allow If I allow http protocol in beforePolicy and allow https protocol in newPolicy, result afterPolicy is not allow http In order to be consistent with the previous strategy, I think that afterPolicy allow http protocol is right
  // before : allow
  // after : allow
  @Test
  public void testAllowUrlProtocol5_1() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .allowUrlProtocols("http")
            .toFactory();

    String input = "<a href='http://example.com'>Hi</a>";
    assertEquals("<a href=\"http://example.com\">Hi</a>", beforePolicy.sanitize(input));

    PolicyFactory newPolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .allowUrlProtocols("https")
            .toFactory();

    PolicyFactory afterPolicy = beforePolicy.and(newPolicy);

    assertEquals("Hi", afterPolicy.sanitize(input));
}
beforePolicy afterPolicy result
allow disallow beforePolicy(allow) and afterPolicy(disallow) no problem
disallow X beforePolicy(disallow) and afterPolicy(disallow) no problem
disallow allow If i disallow http protocol in beforePolicy and allow http protocol in newPolicy, result afterPolicy is not allow http I think that afterPolicy allow http protocol is right
  // before : disallow
  // after : allow
  @Test
  public void testAllowUrlProtocol8() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .disallowUrlProtocols("http")
            .toFactory();

    String input = "<a href='https://example.com'>Hi</a>";
    assertEquals("Hi", beforePolicy.sanitize(input));

    PolicyFactory newPolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .allowUrlProtocols("http")
            .toFactory();

    PolicyFactory afterPolicy = beforePolicy.and(newPolicy);

    assertEquals("Hi", afterPolicy.sanitize(input));
}
beforePolicy afterPolicy result
disallow disallow all disallow no problem

4. allowTextIn disallowTextIn

at first, I tried to organize it all at once, but it got complicated. I'll organize it separately in the next issue.

5. allowAttributes disallowAttributes

at first, I tried to organize it all at once, but it got complicated. I'll organize it separately in the next issue.

Etc

and I have an opinion(it's tiny). I want your feedback.

ElementAndAttributePolicies p = e.getValue();
ElementAndAttributePolicies q = f.policies.get(elName);
if (q != null) {
    p = p.and(q);
}

above code, we separate each policy using p, q variables. then we send q variable in ElementAndAttributePolicies and method. but in ElementAndAttributePolicies and(ElementAndAttributePolicies p) parameter name is p. It keeps p and q confused.

ElementAndAttributePolicies and(ElementAndAttributePolicies p) {
    assert elementName.equals(p.elementName):
    elementName + " != " + p.elementName;
    ...
}

How about we change the naming ElementAndAttributePolicies p -> ElementAndAttributePolicies q?

this method only use in PolicyFactory.and() if you have better idea, I'd love to hear.

yangbongsoo commented 4 years ago

2. allowElements disallowElements case

in PolicyFactory and method,

... 

ElementAndAttributePolicies p = e.getValue();
ElementAndAttributePolicies q = f.policies.get(elName);
if (q != null) {
  p = p.and(q);
} else {
  // Mix in any globals that are not already taken into account in this.
  p = p.andGlobals(f.globalAttrPolicies);
}

...

if q is null, that's mean two cases. one is newPolicy disallow tag, the other is newPolicy doesn't set allow/disallow elements. therefore we need to separate them.

if (q != null) { // beforePolicy and newPolicy has same tag
  p = p.and(q);
  b.put(elName, p);
} else { 

  if (f.policies.isEmpty()) { // disallow or X(set nothing)
    // todo we need to separate disallow case and X(set nothing)

  } else {
    // Mix in any globals that are not already taken into account in this.
    p = p.andGlobals(f.globalAttrPolicies);
    b.put(elName, p);
  }
}

at first I made below code(but I think that using f.textContainers is not good way).

if (q != null) { // beforePolicy and newPolicy has same tag
  p = p.and(q);
  b.put(elName, p);
} else { 

  if (f.policies.isEmpty()) { // => disallow or X

    if (f.textContainers.isEmpty()) { // that means newPolicy set nothing
      p = p.andGlobals(f.globalAttrPolicies);
      b.put(elName, p);
    } else {
    }

  } else {
    // Mix in any globals that are not already taken into account in this.
    p = p.andGlobals(f.globalAttrPolicies);
    b.put(elName, p);
  }
}

any way, above code has problem. look below test code. I think only b tag is allowed is right. but I couldn't make it.

  // beforePolicy : allow
  // afterPolicy : disallow
  @Test
  public void testAllowElements6_1() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("p", "b")
            .toFactory();

    String input = "<p>pText</p>\n<b>bText</b>\n<i>iText</i>\n<s>sText</s>";
    assertEquals("<p>pText</p>\n<b>bText</b>\niText\nsText", beforePolicy.sanitize(input));

    PolicyFactory newPolicy = new HtmlPolicyBuilder()
            .disallowElements("p")
            .toFactory();

    PolicyFactory afterPolicy = beforePolicy.and(newPolicy);

    assertEquals("pText\n<b>bText</b>\niText\nsText", afterPolicy.sanitize(input));
  }

@mikesamuel After all, should this also change the underlying structure? I tried to work with minimal change, but it was blocked.

yangbongsoo commented 4 years ago

2. allowElements disallowElements case

I think again, The idea of changing the existing structure is in the wrong order. I tried to solve the problem by adding new fields without modifying the existing structure.

I make two fields(isNotMatchingDisallowElement and isNotCalledAllowElementsMethod). As I explained in above comment we need to separate disallow case and X(set nothing). so I use that fields.

code : https://github.com/yangbongsoo/java-html-sanitizer/commit/db03df04eb87e335b50fbeaa927cb5be37918750

      if (q != null) {
        p = p.and(q);
        b.put(elName, p);
      } else {

        if (f.policies.isEmpty()) {
          boolean isNotMatchingDisallowElement = true;
          for (String eachDisallowElement : f.disallowElementsName) {
            if (elName.equals(eachDisallowElement)) {
              isNotMatchingDisallowElement = false;
              break;
            }
          }

          if (isNotMatchingDisallowElement || f.isNotCalledAllowElementsMethod) {
            p = p.andGlobals(f.globalAttrPolicies);
            b.put(elName, p);
          }

        } else {
          // Mix in any globals that are not already taken into account in this.
          p = p.andGlobals(f.globalAttrPolicies);
          b.put(elName, p);
        }
      }
yangbongsoo commented 4 years ago

3. allowUrlProtocols disallowUrlProtocols case

in below test code, I set allowUrlprotocols(allow) to newPolicy only.

  // beforePolicy : X
  // newPolicy : allow
  @Test
  public void testAllowUrlProtocol2() {
    PolicyFactory beforePolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .toFactory();

    String input = "<a href='http://example.com'>Hi</a>";
    assertEquals("Hi", beforePolicy.sanitize(input));

    PolicyFactory newPolicy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowAttributes("href")
            .onElements("a")
            .allowUrlProtocols("http")
            .toFactory();

    PolicyFactory afterPolicy = beforePolicy.and(newPolicy);

    assertEquals("Hi", afterPolicy.sanitize(input));
  }

so in JoinedAttributePolicy, we had two polices(FilterUrlByProtocolAttributePolicy).

href-protocol-test1

the problem is here. in for loop, first AttributePolicy doesn't have allowed protocols. so value is null. and because value is null we break loop(even if next AttributePolicy has allowed protocols).

  public @Nullable String apply(
      String elementName, String attributeName, @Nullable String rawValue) {
    String value = rawValue;
    for (AttributePolicy p : policies) {
      if (value == null) { break; }
      value = p.apply(elementName, attributeName, value);
    }
    return value;
  }

@mikesamuel I think that in a = AttributePolicy.Util.join(a, b) logic, we should combine protocols. but I am not sure we have to change that codes. I need your opinion.

      // a is FilterUrlByProtocolAttributePolicy(protocols 0)
      AttributePolicy a = e.getValue();
      // b is FilterUrlByProtocolAttributePolicy(protocols 1)
      AttributePolicy b = q.attrPolicies.get(attrName);
      if (b != null) {
        a = AttributePolicy.Util.join(a, b);
      }
      joinedAttrPolicies.put(attrName, a);
mikesamuel commented 4 years ago

Sorry for the delay:

I think that in a = AttributePolicy.Util.join(a, b) logic, we should combine protocols.

Hmm. I wonder whether we could define

interface JoinableAttributePolicy {
  /** Null if this does not know how to join with other; otherwise the joined policy. */
  AttributePolicy tryJoinWith(JoinableAttributePolicy other);
}

so that AttributePolicy.Util.Join could try to let policies join themselves, and policies that wrap other policies could try to join the policy they wrap.

yangbongsoo commented 4 years ago

If I add tryJoinWith method likes below interface in AttributePolicy, StylingPolicy will affect.

  /** An attribute policy that is joinable. */
  static interface JoinableAttributePolicy extends AttributePolicy, Joinable<JoinableAttributePolicy> {
    // Parameterized Appropriately.

    /** Null if this does not know how to join with other; otherwise the joined policy. */
    AttributePolicy tryJoinWith(JoinableAttributePolicy other);
  }

give me a time. I should analyze StylingPolicy at first.