OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
847 stars 213 forks source link

<a href='https://javascript:void(0)' target='_new' > not sanitized #167

Closed otolabqu closed 5 years ago

otolabqu commented 5 years ago

While sanitizing HTML, I noticed some javascript-like code coming through, when part of a href attribute prefixed with https:// , such as <a href='https://javascript:void(0)' target='_new' >click</a>

Not sure if this is a risk. Chrome does not execute it as JS when clicked on. Still, I thought this would be removed by the sanitizer.

If we remove the https:// part, the sanitizer removes it, as in <a href='javascript:void(0)' target='_new' >click</a> 

A java test to make this happen follows.


import org.junit.Test;
import org.owasp.html.HtmlPolicyBuilder;
import org.owasp.html.PolicyFactory;
import org.owasp.html.Sanitizers;

public class SanitizerUnitTest {

  @Test
  public void sanitizeJavascriptHref() {
    String linkWithJs = "<html><a href='https://javascript:void(0)' target='_new' >click</a> </html>";
    String sanitized = Sanitizers.LINKS.sanitize(linkWithJs);
    System.out.println(sanitized);
  }

  @Test
  public void sanitizeJavascriptHref2() {
    PolicyFactory policy = new HtmlPolicyBuilder()
        .allowElements("a")
        .allowUrlProtocols("https")
        .allowAttributes("href").onElements("a")
        .requireRelNofollowOnLinks()
        .toFactory();
    String linkWithJs = "<html><a href=\"https://javascript:void%280%29\" rel=\"nofollow\">click</a></html>";
    String safeHTML = policy.sanitize(linkWithJs);
    System.out.println(safeHTML);
  }
}

The printout of this test is

<a href="https://javascript:void%280%29" rel="nofollow">click</a>
<a href="https://javascript:void%280%29" rel="nofollow">click</a> 
mikesamuel commented 5 years ago

Thanks for the test case.

I'm unaware of any URL parser that doesn't treat this as an HTTPS URL with a malformed port.

Have you seen a different behavior?

jmanico commented 5 years ago

This does not seem like a risk at all. Only javascript: schemes execute.

While sanitizing HTML, I noticed some javascript-like code coming through, when part of a |href| attribute prefixed with |https://| , such as |click|

Not sure if this is a risk. Chrome does not execute it as JS when clicked on. Still, I thought this would be removed by the sanitizer.

If we remove the https:// part, the sanitizer removes it, as in |click |

A java test to make this happen follows.

|import org.junit.Test; import org.owasp.html.HtmlPolicyBuilder; import org.owasp.html.PolicyFactory; import org.owasp.html.Sanitizers; public class SanitizerUnitTest { @Test public void sanitizeJavascriptHref() { String linkWithJs = "<a href='https://javascript:void(0)' target='_new' >click "; String sanitized = Sanitizers.LINKS.sanitize(linkWithJs); System.out.println(sanitized); } @Test public void sanitizeJavascriptHref2() { PolicyFactory policy = new HtmlPolicyBuilder() .allowElements("a") .allowUrlProtocols("https") .allowAttributes("href").onElements("a") .requireRelNofollowOnLinks() .toFactory(); String linkWithJs = "<a href=\"https://javascript:void%280%29\" rel=\"nofollow\">click"; String safeHTML = policy.sanitize(linkWithJs); System.out.println(safeHTML); } } |

The printout of this test is

|click <a href="https://javascript:void%280%29" rel="nofollow">click  |

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OWASP/java-html-sanitizer/issues/167, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgcCfBMciL8V12Qyqvx3siVGIdGZjh6ks5vddmQgaJpZM4ccgMt.

-- Jim Manico Manicode Security https://www.manicode.com

otolabqu commented 5 years ago

Thanks for the test case.

I'm unaware of any URL parser that doesn't treat this as an HTTPS URL with a malformed port.

Have you seen a different behavior?

Thanks for replying. No, I haven't seen a different behavior.

mikesamuel commented 5 years ago

Closing as non-actionable.


Per https://url.spec.whatwg.org/#concept-basic-url-parser there are two cases where an output's scheme can be "javascript":

  1. The scheme state logic is reached where buffer contains exactly "javascript" and the next char is ':'.
  2. The base URL's scheme is javascript and the input specifies no scheme.

Re the first case, by inspection of the spec, the state machine never transitions back to either scheme start state or scheme state once leaving those states, so this only happens when the buffer contains zero or more ASCII whitespace, and then "javascript:" case-insensitively.


In the second case, I believe this can only happen when a document does something odd like <base href="javascript:..."> and possibly not even then. This library filters out "javascript:" URLs even if a policy is foolish enough to allow <base href>.

When a document is created as a result of a javascript: URL, browsers reuse the origin from the document that loaded it so its default base URL does not have scheme javascript.

https://html.spec.whatwg.org/multipage/origin.html

The Document was created as part of the processing for javascript: URLs
The origin of the active document of the browsing context being navigated when the navigate algorithm was invoked.

Embedders would be wise not to do <base href="javascript:...">.