Open Pamplemousse opened 2 years ago
While trying to use .allowElements(), I noticed some surprising results: different tags yield different kind of output.
.allowElements()
For example:
HtmlPolicyBuilder().allowElements("a").toFactory().sanitize("<a>")
HtmlPolicyBuilder().allowElements("div").toFactory().sanitize("<div>")
<div></div>
I expected 1. not to be empty, but also the two results to be consistent...
Am I missing something, am I doing anything wrong?
With the following (trimmed) code, I tested many elements to expose their behaviour.
import java.util.Arrays; import javax.swing.text.html.HTML; import org.owasp.html.PolicyFactory; import org.owasp.html.HtmlPolicyBuilder; // [...] String[] HTML_ELEMENTS = Arrays .stream(HTML.getAllTags()) .map(Object::toString) .toArray(String[]::new); // [...] for (int i=0; i<HTML_ELEMENTS.length; i++) { String element = HTML_ELEMENTS[i]; String sanitized = HtmlPolicyBuilder() .allowElements(element) .toFactory() .sanitize("<" + element + ">"); System.out.println(element + ": " + sanitized); }
And I got the following list, where several other tags behave like a:
a
a: address: <address></address> applet: <applet></applet> area: <area /> b: <b></b> base: <base /> basefont: <basefont /> big: <big></big> blockquote: <blockquote></blockquote> body: <body></body> br: <br /> caption: <caption></caption> center: <center></center> cite: <cite></cite> code: <code></code> dd: <dd></dd> dfn: <dfn></dfn> dir: <dir></dir> div: <div></div> dl: <dl></dl> dt: <dt></dt> em: <em></em> font: form: <form></form> frame: <frame></frame> frameset: <frameset></frameset> h1: <h1></h1> h2: <h2></h2> h3: <h3></h3> h4: <h4></h4> h5: <h5></h5> h6: <h6></h6> head: <head></head> hr: <hr /> html: <html></html> i: <i></i> img: input: isindex: <isindex /> kbd: <kbd></kbd> li: <li></li> link: <link /> map: <map></map> menu: <menu></menu> meta: <meta /> nobr: <nobr></nobr> noframes: <noframes></noframes> object: <object></object> ol: <ol></ol> option: <option></option> p: <p></p> param: <param /> pre: <pre></pre> samp: <samp></samp> script: <script></script> select: <select></select> small: <small></small> span: strike: <strike></strike> s: <s></s> strong: <strong></strong> style: <style></style> sub: <sub></sub> sup: <sup></sup> table: <table></table> td: <td></td> textarea: <textarea></textarea> th: <th></th> title: <title></title> tr: <tr></tr> tt: <tt></tt> u: <u></u> ul: <ul></ul> var: <var></var>
noticed this thing with span-element, wont get sanitized if given style-attribute 🤔
It's expected
https://github.com/OWASP/java-html-sanitizer/blob/032d11b8931442a026d12a3b44176652e631a8a1/src/main/java/org/owasp/html/HtmlPolicyBuilder.java#L165
While trying to use
.allowElements()
, I noticed some surprising results: different tags yield different kind of output.For example:
HtmlPolicyBuilder().allowElements("a").toFactory().sanitize("<a>")
returns nothing;HtmlPolicyBuilder().allowElements("div").toFactory().sanitize("<div>")
returns<div></div>
.I expected 1. not to be empty, but also the two results to be consistent...
Am I missing something, am I doing anything wrong?
With the following (trimmed) code, I tested many elements to expose their behaviour.
And I got the following list, where several other tags behave like
a
: