OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
850 stars 214 forks source link

Discrepancies of results when sanitizing allowed tags #260

Open Pamplemousse opened 2 years ago

Pamplemousse commented 2 years ago

While trying to use .allowElements(), I noticed some surprising results: different tags yield different kind of output.

For example:

  1. A policy with HtmlPolicyBuilder().allowElements("a").toFactory().sanitize("<a>") returns nothing;
  2. A policy with HtmlPolicyBuilder().allowElements("div").toFactory().sanitize("<div>") returns <div></div>.

I expected 1. not to be empty, but also the two results to be consistent...

Am I missing something, am I doing anything wrong?


With the following (trimmed) code, I tested many elements to expose their behaviour.

import java.util.Arrays;
import javax.swing.text.html.HTML;
import org.owasp.html.PolicyFactory;
import org.owasp.html.HtmlPolicyBuilder;

// [...]

String[] HTML_ELEMENTS = Arrays
    .stream(HTML.getAllTags())
    .map(Object::toString)
    .toArray(String[]::new);

// [...]

for (int i=0; i<HTML_ELEMENTS.length; i++) {
    String element = HTML_ELEMENTS[i];
    String sanitized = HtmlPolicyBuilder()
        .allowElements(element)
        .toFactory()
        .sanitize("<" + element + ">");
    System.out.println(element + ": " + sanitized);
}

And I got the following list, where several other tags behave like a:

    a:
    address: <address></address>
    applet: <applet></applet>
    area: <area />
    b: <b></b>
    base: <base />
    basefont: <basefont />
    big: <big></big>
    blockquote: <blockquote></blockquote>
    body: <body></body>
    br: <br />
    caption: <caption></caption>
    center: <center></center>
    cite: <cite></cite>
    code: <code></code>
    dd: <dd></dd>
    dfn: <dfn></dfn>
    dir: <dir></dir>
    div: <div></div>
    dl: <dl></dl>
    dt: <dt></dt>
    em: <em></em>
    font:
    form: <form></form>
    frame: <frame></frame>
    frameset: <frameset></frameset>
    h1: <h1></h1>
    h2: <h2></h2>
    h3: <h3></h3>
    h4: <h4></h4>
    h5: <h5></h5>
    h6: <h6></h6>
    head: <head></head>
    hr: <hr />
    html: <html></html>
    i: <i></i>
    img:
    input:
    isindex: <isindex />
    kbd: <kbd></kbd>
    li: <li></li>
    link: <link />
    map: <map></map>
    menu: <menu></menu>
    meta: <meta />
    nobr: <nobr></nobr>
    noframes: <noframes></noframes>
    object: <object></object>
    ol: <ol></ol>
    option: <option></option>
    p: <p></p>
    param: <param />
    pre: <pre></pre>
    samp: <samp></samp>
    script: <script></script>
    select: <select></select>
    small: <small></small>
    span:
    strike: <strike></strike>
    s: <s></s>
    strong: <strong></strong>
    style: <style></style>
    sub: <sub></sub>
    sup: <sup></sup>
    table: <table></table>
    td: <td></td>
    textarea: <textarea></textarea>
    th: <th></th>
    title: <title></title>
    tr: <tr></tr>
    tt: <tt></tt>
    u: <u></u>
    ul: <ul></ul>
    var: <var></var>
anttinym commented 1 year ago

noticed this thing with span-element, wont get sanitized if given style-attribute 🤔

subbudvk commented 7 months ago

It's expected

https://github.com/OWASP/java-html-sanitizer/blob/032d11b8931442a026d12a3b44176652e631a8a1/src/main/java/org/owasp/html/HtmlPolicyBuilder.java#L165