OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
834 stars 209 forks source link

Issue while disallowing attributes matching pattern #292

Open subbudvk opened 8 months ago

subbudvk commented 8 months ago

I am trying to disallow attributes matching a specific pattern.

    ```
    HtmlPolicyBuilder builder = new HtmlPolicyBuilder();
     PolicyFactory factory = builder.allowUrlProtocols("http", "https").allowElements("img","a","div","span")        
     .allowAttributes("alt", "src").onElements("img")
     .allowAttributes("border", "height", "width").onElements("img")
     .allowAttributes("href").matching(Pattern.compile(".*google.*")).onElements("a")
     .disallowAttributes("src").matching(Pattern.compile(".*google.*")).onElements("img")
     .toFactory();
     System.out.println("ALLOW ATTRIBUTES :: " + factory.sanitize("<a href='http://google.com'>"));
     System.out.println("DISALLOW ATTRIBUTES :: " + factory.sanitize("<img src='http://yahoo.com'>"));
    ```

Allow attributes matching a particular pattern alone works as expected. Disallow attributes matching pattern "google" not working as expected and discards http://yahoo.com

If I am not wrong disallowAttribute() does a allowAttribute() matching a _REJECTALL policy so no further matching can be called on the returned AttributeBuilder. Is my understanding correct? I understand the library is whitelist based and everything not allowed by default is rejected by default. But, in our case we ship a minimal policy and the consumer may still want to restrict few more entities. If my understanding above on why this doesn't work is right, is there a way to achieve it?