OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

is <plaintext> element required ?? #220

Open Sam2243 opened 3 years ago

Sam2243 commented 3 years ago

Hi,

So I have a text as "<img src=x onerror=window.open('http://evil.test.com/');/>"

and the following is the policy

        PolicyFactory policy = new HtmlPolicyBuilder()
            .allowElements("a")
            .allowUrlProtocols("https", "http")
            .allowAttributes("href").onElements("a")
            .toFactory();

When I sanitize it, I dont get any output. Although when I add

to the text, it does get sanitized. Like the following,</p> <p><code>String text = "&lt;plaintext&gt;&lt;img src=x onerror=window.open('http://evil.test.com/');/&gt;";</code></p> <p>Why do I need to put plaintext element here? Any alternatives to it?</p> <p>Appreciate your help.</p> <p>Thanks</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/li-a"><img src="https://avatars.githubusercontent.com/u/29395069?v=4" />li-a</a> commented <strong> 3 years ago</strong> </div> <div class="markdown-body"> <p>In your code snippet, you are not whitelisting the <code>img</code> element or the atributes <code>src</code> and <code>onerror</code>, so yes, it is expected that all markup gets removed from your input. That's exactly the goal of sanitizing.</p> <blockquote> <p>Although when I add <plaintext> to the text, it does get sanitized.</p> </blockquote> <p>What do you mean when you say &quot;get sanitized&quot; here? What is the output? I would expect the input to be returned essentially unmodified.</p> <p><a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/plaintext">Adding <code>&lt;plaintext&gt;</code> will mark everything after that start tag (including things like <code>&lt;/body&gt;</code>) as non-markup text.</a> The sanitizer will ignore everything after that start tag, and if you tried to place such a string into a document, you would probably break the document. I doubt <code>&lt;plaintext&gt;</code> is something you want.</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>