OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
855 stars 214 forks source link

CSS URLs longer than 1024 characters are not allowed #187

Open sapio-dwelch opened 5 years ago

sapio-dwelch commented 5 years ago

When trying to sanitize large data URIs used as background images in CSS properties, there is a hard-coded URL limit of 1024 characters (this is in StylingPolicy.sanitizeAndAppendUrl). Any value larger than 1024 characters is removed.

public class HtmlSanitizer {
    public static String TOO_LONG = "<table style=\"width:100%;\">\r\n" + 
            "    <tr>\r\n" + 
            "        <td style=\"background-image:url('data:image/svg+xml;base64,PHN2ZyB2aWV3" + 
            "Qm94PSIwIDAgMjAwLjAgMjAwvoidIGZpbGwtb3BhY2l0eT0iMSIgeG1sbnM6eGxpbms9Imh0dHA6Ly93" + 
            "d3cudzMub3JnLzE5OTkveGxpbmsiIGNvbG9yLXJlbmRlcmluZz0iYXV0byIgY29sb3ItaW50ZXJwb2xh" + 
            "dGlvbj0iYXV0byIgdGV4dC1yZW5kZXJpbmc9ImF1dG8iIHN0cm9rZT0iYmxhY2siIHN0cm9rZS1saW5l" + 
            "Y2FwPSJzcXVhcmUiIHdpZHRoPSIyMDAiIHN0cm9rZS1taXRlcmxpbWl0PSIxMCIgc2hhcGUtcmVuZGVy" + 
            "aW5nPSJhdXRvIiBzdHJva2voidBhY2l0eT0iMSIgZmlsbD0iYmxhY2siIHN0cm9rZS1kYXNoYXJyYXk9" + 
            "Im5vbmUiIGZvbnQtd2VpZ2h0PSJub3JtYWwiIHN0cm9rZS13aWR0aD0iMSIgaGVpZ2h0PSIyMDAiIHht" + 
            "bG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgZm9udC1mYW1pbHk9IidEaWFsb2cnIiBmb250" + 
            "LXN0eWxlPSJub3JtYWwiIHN0cm9rZS1saW5lam9pbj0ibWl0ZXIiIGZvbnQtc2l6ZT0iMTJweCIgc3Ry" + 
            "b2tlLWRhc2hvZmZzZXQ9IjAiIGltYWdlLXJlbmRlcmluZz0iYXV0byINCj48IS0tR2VuZXJhdGVkIGJ5" + 
            "IE1hcnZpbiB3aXRoIEJhdGlrIFNWRyBHZW5lcmF0b3ItLT48ZGVmcyBpZD0iZ2VuZXJpY0RlZnMiDQog" + 
            "IC8+PGcNCiAgPjxkZWZzIGlkPSIxNDQ4MjQ5ODEwMjEtZGVmczEiDQogICAgPjxjbGlwUGF0aCBjbGlw" + 
            "UGF0aFVuaXRzPSJ1c2VyU3BhY2VPblVzZSIgaWQ9voidNDgyNDk4MTA0OTgtY2xpcFBhdGgxIg0KICAg" + 
            "ICAgPjxwYXRoIGQ9Ik0wIDAgTDIwMCAwIEwyMDAgMjAwIEwwIDIwMCBMMCAwIFoiDQogICAgICAvPjwv" + 
            "Y2xpcFBhdGgNCiAgICAgID48Y2xpcFBhdGggY2xpcFBhdGhVbml0cz0idXNlclNwYWNlT25Vc2UiIGlk" + 
            "PSIxNDQ4MjQ5OD=='); background-size: contain; background-repeat: no-repeat; back" + 
            "ground-position:center; height:100px; margin:auto;\"/>\r\n" + 
            "    </tr>\r\n" + 
            "</table>";

    public static String NOT_TOO_LONG = "<table style=\"width:100%;\">\r\n" + 
            "    <tr>\r\n" + 
            "        <td style=\"background-image:url('data:image/svg+xml;base64,PHN2ZyB2aWV3" + 
            "Qm94PSIwIDAgMjAwLjAgMjAwvoidIGZpbGwtb3BhY2l0eT0iMSIgeG1sbnM6eGxpbms9Imh0dHA6Ly93" + 
            "d3cudzMub3JnLzE5OTkveGxpbmsiIGNvbG9yLXJlbmRlcmluZz0iYXV0byIgY29sb3ItaW50ZXJwb2xh" + 
            "dGlvbj0iYXV0byIgdGV4dC1yZW5kZXJpbmc9ImF1dG8ivoidcm9rZT0iYmxhY2siIHN0cm9rZS1saW5l" + 
            "Y2FwPSJzcXVhcmUiIHdpZHRoPSIyMDAiIHN0cm9rZS1taXRlcmxpbWl0PSIxMCIgc2hhcGUtcmVuZGVy" + 
            "IE1hcnZpbiB3aXRoIEJhdGlrIFNWRyBHZW5lcmF0b3ItLT48ZGVmcyBpZD0iZ2VuZXJpY0RlZnMiDQog" + 
            "IC8+PGcNCiAgPjxkZWZzIGlkPSIxNDQ4MjQ5ODEwMjEtZGVmczEiDQogICAgPjxjbGlwUGF0aCBjbGlw" + 
            "UGF0aFVuaXRzPSJ1c2VyU3BhY2VPblVzZSIgaWQ9IjE0NDgyNDk4MTA0OTgtY2xpcFBhdGgxIg0KICAg" + 
            "ICAgPjxwYXRoIGQ9Ik0wIDAgTDIwMCAwIEwyMDAgMjAwIEwwIDIwMCBMMCAwIFoiDQogICAgICAvPjwv" + 
            "Y2xpcFBhdGgNCiAgvoidID48Y2xpcFBhdGggY2xpcFBhdGhVbml0cz0idXNlclNwYWNlT25Vc2UiIGlk" + 
            "PSIxNDQ4MjQ5OD=='); background-size: contain; background-repeat: no-repeat; back" + 
            "ground-position:center; height:100px; margin:auto;\"/>\r\n" + 
            "    </tr>\r\n" + 
            "</table>";

    public static final PolicyFactory HTML_POLICY = new HtmlPolicyBuilder()
            // allow all elements to be styled
            .allowStyling()
            // allow urls in bg images.
            .allowUrlsInStyles(AttributePolicy.IDENTITY_ATTRIBUTE_POLICY)
            // and also allow "data" URLs
            .allowUrlProtocols("data")
            // allow some more "global" elements
            .allowElements("table", "tbody", "th", "tr", "td").toFactory();

    public static void main(String args[]) {
        System.out.println(HTML_POLICY.sanitize(TOO_LONG));
        System.out.println(HTML_POLICY.sanitize(NOT_TOO_LONG));
    }
}

The output for TOO_LONG will not include the background-image property, but the output for NOT_TOO_LONG will. Is there a reason for the limit?

mikesamuel commented 5 years ago

I believe the thought was that long URLs can be a denial-of-service vector.

That decision was made before data: URLs were widely used for images, so could be reconsidered.

Do you know of a good cutoff?

https://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers seems to conflate the address bar length and length limits to fetch.spec.whatwg.org.

sapio-dwelch commented 5 years ago

Thanks for the reply.

I am not aware of what a good cutoff would be.

I thought that maybe the justification for a limit of 1024 was long URL == attack, but, if so, why is it only being enforced for URLs found in style sheets and not for URLs found elsewhere? (I have tested an image with a data URL greater than 4000 bytes and there seems to be no such limit). If there is no enforcement of this for all URLs, then I would argue that it be removed for the URLs found in style sheets, as well.

Those who are concerned about the max lengths could still enforce a 1024 character limit (or any limit) using a custom AttributePolicy.

ioleo commented 1 year ago

Is there a workaround to allow any size data:image blocks? I have a size check on the whole payload of HTTP request, so I don't need to check individual properties length.

mikesamuel commented 10 months ago

Length limits avoid a lot of boundary problems in downstream code so seem in scope for sanitizers.

https://stackoverflow.com/a/417184/20394 suggests that maybe the limit should be 2000 instead of 1024