OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

Not able to sanitize html when plaintext tag is included #219

Open manangurjar opened 3 years ago

manangurjar commented 3 years ago

I am using below policy -

PolicyFactory policy = new HtmlPolicyBuilder()
                .allowElements("a", "pre")
                .allowUrlProtocols("https", "http")
                .allowAttributes("href").onElements("a")
                .requireRelNofollowOnLinks().toFactory();

And I am trying to filter out below html string -

String text = "<plaintext><img src=\"/\" onerror=\"confirm(document.domain)\";>";

But if I try to sanitize the text and filter the string out it leaves the img tag as is -

String text = "<plaintext><img src=\"/\" onerror=\"confirm(document.domain)\";>";

PolicyFactory policy = new HtmlPolicyBuilder()
                .allowElements("a", "pre")
                .allowUrlProtocols("https", "http")
                .allowAttributes("href").onElements("a")
                .requireRelNofollowOnLinks().toFactory();

String sanitizedText = policy.sanitize(StringEscapeUtils.unescapeHtml4(text));
System.out.println(StringEscapeUtils.unescapeHtml4(sanitizedText));

Output - <img src="/" onerror="confirm(document.domain)";>

May be the policy might need some correction, but could you please take a look and confirm

jmanico commented 3 years ago

Your policy is simple and this looks like a legit bypass. More from us soon and thank you for the report.

-- Jim Manico @Manicode

On Dec 14, 2020, at 7:33 AM, Manan Gurjar notifications@github.com wrote:

 I am using below policy -

PolicyFactory policy = new HtmlPolicyBuilder() .allowElements("a", "pre") .allowUrlProtocols("https", "http") .allowAttributes("href").onElements("a") .requireRelNofollowOnLinks().toFactory(); And I am trying to filter out below html string -

String text = "

&lt;img src=\&quot;/\&quot; onerror=\&quot;confirm(document.domain)\&quot;;&gt;&quot;;</p> <p>But if I try to sanitize the text and filter the string out it leaves the img tag as is -</p> <p>String text = &quot;<plaintext>&lt;img src=\&quot;/\&quot; onerror=\&quot;confirm(document.domain)\&quot;;&gt;&quot;;</p> <p>PolicyFactory msLinkablePolicyFactory = new HtmlPolicyBuilder() .allowElements(&quot;a&quot;, &quot;pre&quot;) .allowUrlProtocols(&quot;https&quot;, &quot;http&quot;) .allowAttributes(&quot;href&quot;).onElements(&quot;a&quot;) .requireRelNofollowOnLinks().toFactory();</p> <p>String sanitizedText = msLinkablePolicyFactory.sanitize(StringEscapeUtils.unescapeHtml4(text)); System.out.println(StringEscapeUtils.unescapeHtml4(sanitizedText)); Output - &lt;img src=&quot;/&quot; onerror=&quot;confirm(document.domain)&quot;;&gt;</p> <p>May be the policy might need some correction, but could you please take a look and confirm</p> <p>— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.</p> </blockquote> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mikesamuel"><img src="https://avatars.githubusercontent.com/u/368886?v=4" />mikesamuel</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>What are the <code>StringEscapeUtils.unescapeHtml4</code> calls for?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mikesamuel"><img src="https://avatars.githubusercontent.com/u/368886?v=4" />mikesamuel</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>The following test without those calls passes for me. What am I missing?</p> <pre><code class="language-java"> @Test public static final void testIssue219() { String text = "&lt;plaintext&gt;&lt;img src=\"/\" onerror=\"confirm(document.domain)\";&gt;"; PolicyFactory policy = new HtmlPolicyBuilder() .allowElements("a", "pre") .allowUrlProtocols("https", "http") .allowAttributes("href").onElements("a") .requireRelNofollowOnLinks() .toFactory(); String sanitizedText = policy.sanitize(text); assertEquals( "&amp;lt;img src&amp;#61;&amp;#34;/&amp;#34; onerror&amp;#61;&amp;#34;confirm(document.domain)&amp;#34;;&amp;gt;", sanitizedText ); }</code></pre> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mikesamuel"><img src="https://avatars.githubusercontent.com/u/368886?v=4" />mikesamuel</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>The policy does not </p> <blockquote> <p>leave the img tag as is</p> </blockquote> <p>because there is no <code>img</code> tag in <code>"&lt;plaintext&gt;&lt;img src=\"/\" onerror=\"confirm(document.domain)\";&gt;"</code>, only text characters.</p> <p>The <a href="https://html.spec.whatwg.org/#parsing-main-inbody:plaintext-state-2">HTML spec</a> explains:</p> <blockquote> <p>Once a start tag with the tag name &quot;plaintext&quot; has been seen, that will be the last token ever seen other than character tokens (and the end-of-file token), because there is no way to switch out of the PLAINTEXT state.</p> </blockquote> <p>The sanitizer correctly escapes those text characters which this line then unescapes before printing:</p> <pre><code class="language-java">System.out.println(StringEscapeUtils.unescapeHtml4(sanitizedText));</code></pre> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/manangurjar"><img src="https://avatars.githubusercontent.com/u/11473727?v=4" />manangurjar</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>ok understood. I was using the policy to sanitze the blog comments and to allow only specific tags. But then it was figured that using plaintext tag, user was able to bypass the policy and was able to execute any JS, which I thought is a possible attack. </p> <p>So, I think I will have to deal with plaintext tag separately.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mikesamuel"><img src="https://avatars.githubusercontent.com/u/368886?v=4" />mikesamuel</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>What plaintext bypass? Just don't unescape the html. If you unescape, there's a much simpler bypass: <code>&amp;lt;script&amp;gt;...&amp;lt;/script&amp;gt;</code>.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/jmanico"><img src="https://avatars.githubusercontent.com/u/531465?v=4" />jmanico</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>Thank you Mike and I’m sorry for the false alarm. </p> <p>-- Jim Manico @Manicode</p> <blockquote> <p>On Dec 14, 2020, at 6:23 PM, Mike Samuel <a href="mailto:notifications@github.com">notifications@github.com</a> wrote:</p> <p> What plaintext bypass? Just don't unescape the html. If you unescape, there's a much simpler bypass: &lt;script&gt;...&lt;/script&gt;.</p> <p>— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.</p> </blockquote> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>