OWASP / owasp-java-encoder

The OWASP Java Encoder is a Java 1.5+ simple-to-use drop-in high-performance encoder class with no dependencies and little baggage. This project will help Java web developers defend against Cross Site Scripting!
https://owasp.org/www-project-java-encoder/
BSD 3-Clause "New" or "Revised" License
483 stars 112 forks source link

Create encode for URL function #42

Closed ropnop closed 3 years ago

ropnop commented 3 years ago

Hello! Wanted to start a discussion around adding a function that makes it safe to append strings to URLs in Java. I haven't found a great way to do this, and if we can agree on a good approach I'd be happy to submit a PR.

Imagine the following code:

URL u = new URL("http://example.com/api/v2/" + untrustedInput + "/data");
HttpURLConnection con = (HttpURLConnection) u.openConnection();
...etc...

If unrtustedInput comes in as foo../../../bar what method can make it safe and not allow for path traversal? encode.forUriComponent does not work here because Java URL will just decode the value.

The only good approach I've found is to just strip out all the dots and slashes, as seen here: https://github.com/linkedin/URL-Detector/blob/173c6dbb5d3453d8327b59f254905cdf0338a7de/url-detector/src/main/java/com/linkedin/urls/PathNormalizer.java#L40

Does this sound like something that should belong in owasp-java-encoder? Any other ideas for a sanitization/encoding function that would work?

jmanico commented 3 years ago

A better approach is to URL encode parameters which is a decades old encoding method.

We give examples of this in detail in our project docs.

Hit the "how to use" link https://owasp.org/www-project-java-encoder/

Encode URL parameter values    

Encode REST URL parameters    

With respect, no PR needed. This is the way of the encoder.

Aloha. Jim

On 11/23/20 1:48 PM, Ronnie Flathers wrote:

Hello! Wanted to start a discussion around adding a function that makes it safe to append strings to URLs in Java. I haven't found a great way to do this, and if we can agree on a good approach I'd be happy to submit a PR.

Imagine the following code:

URL u= new URL("http://example.com/api/v2/" + untrustedInput+ "/data"); HttpURLConnection con= (HttpURLConnection) u.openConnection(); ...etc...

If |unrtustedInput| comes in as |foo../../../bar| what method can make it safe and not allow for path traversal? |encode.forUriComponent| does not work here because Java URL will just decode the value.

The only good approach I've found is to just strip out all the dots and slashes, as seen here: https://github.com/linkedin/URL-Detector/blob/173c6dbb5d3453d8327b59f254905cdf0338a7de/url-detector/src/main/java/com/linkedin/urls/PathNormalizer.java#L40 https://github.com/linkedin/URL-Detector/blob/173c6dbb5d3453d8327b59f254905cdf0338a7de/url-detector/src/main/java/com/linkedin/urls/PathNormalizer.java#L40

Does this sound like something that should belong in owasp-java-encoder? Any other ideas for a sanitization/encoding function that would work?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OWASP/owasp-java-encoder/issues/42, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCMXGYB2JSWIOWMPBH3SRLYDXANCNFSM4UAEYCMA.

-- Jim Manico Manicode Security https://www.manicode.com

ropnop commented 3 years ago

Hey Jim! Thanks for the quick response. I agree - encoded parameters are the best way to go, but in the case I'm thinking of, the untrusted input has to be part of the URL itself, not a parameter. I don't think URL encoding can be relied on since servers will just decode it and canonicalize the path. Here's a quick test:

public static void main(String [] args) throws Exception {

        String bad = "okay.txt/../../../../key.txt";
        URL badURL = new URL("http://127.0.0.1:8000/a/b/c/" + Encode.forUriComponent(bad));
        System.out.println("Bad URL: " + badURL.toString());
        HttpURLConnection con = (HttpURLConnection) badURL.openConnection();
        con.setRequestMethod("GET");
        int responseCode = con.getResponseCode();
        System.out.println("GET Response Code :: " + responseCode);
        if (responseCode == HttpURLConnection.HTTP_OK) { // success
            BufferedReader in = new BufferedReader(new InputStreamReader(
                    con.getInputStream()));
            String inputLine;
            StringBuffer response = new StringBuffer();

            while ((inputLine = in.readLine()) != null) {
                response.append(inputLine);
            }
            in.close();

            // print result
            System.out.println(response.toString());
        } else {
            System.out.println("GET request not worked");
        }

    }

against a Python server, the file called "key.txt" in the root directory is accessed instead of files in /a/b/c as intended:

Bad URL: http://127.0.0.1:8000/a/b/c/okay.txt%2F..%2F..%2F..%2F..%2Fkey.txt
GET Response Code :: 200
sensitive

Is there a different encoder that I should be using?

jmanico commented 3 years ago

Let me take a careful look at this with Jeff Ichnowski (the original author). This might be a worthy bug report. I'll get back to you asap.

On 11/23/20 2:06 PM, Ronnie Flathers wrote:

Hey Jim! Thanks for the quick response. I agree - encoded parameters are the best way to go, but in the case I'm thinking of, the untrusted input has to be part of the URL itself, not a parameter. I don't think URL encoding can be relied on since servers will just decode it and canonicalize the path. Here's a quick test:

public static void main(String [] args) throwsException {

     String  bad=  "okay.txt/../../../../key.txt";
     URL  badURL=  new  URL("http://127.0.0.1:8000/a/b/c/"  +  Encode.forUriComponent(bad));
     System.out.println("Bad URL: "  +  badURL.toString());
     HttpURLConnection  con=  (HttpURLConnection) badURL.openConnection();
     con.setRequestMethod("GET");
     int  responseCode=  con.getResponseCode();
     System.out.println("GET Response Code :: "  +  responseCode);
     if  (responseCode==  HttpURLConnection.HTTP_OK) {// success
         BufferedReader  in=  new  BufferedReader(new  InputStreamReader(
                 con.getInputStream()));
         String  inputLine;
         StringBuffer  response=  new  StringBuffer();

         while  ((inputLine=  in.readLine())!=  null) {
             response.append(inputLine);
         }
         in.close();

         // print result
         System.out.println(response.toString());
     }else  {
         System.out.println("GET request not worked");
     }

 }

against a Python server, the file called "key.txt" in the root directory is accessed instead of files in |/a/b/c| as intended:

|Bad URL: http://127.0.0.1:8000/a/b/c/okay.txt%2F..%2F..%2F..%2F..%2Fkey.txt GET Response Code :: 200 sensitive |

Is there a different encoder that I should be using?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OWASP/owasp-java-encoder/issues/42#issuecomment-732496942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCLOIRZGILLGGUVM32LSRL2GXANCNFSM4UAEYCMA.

-- Jim Manico Manicode Security https://www.manicode.com

jmanico commented 3 years ago

Ronnie,

.NET has two functions for URL encoding.

https://docs.microsoft.com/en-us/dotnet/api/system.web.httputility.urlpathencode?view=net-5.0 for a path fragment like your example below.

And the typical

https://docs.microsoft.com/en-us/dotnet/api/system.web.httputility.urlencode?view=net-5.0 for traditional URL encoding

The docs I'm reading say (maybe) this level of encoding nuance is what is really needed to stop problems like you described below.

Ideas...

1) If you have access to a .NET dev setup, can you see if you run into the same problem using UrlPathEncode?

and

2) Check out the research at https://www.baeldung.com/java-url-encoding-decoding section 5 "5. Encode a Path Segment" and see if that helps.

I think we need a new URL encoding function called "Encoder.forUriPath(Stirng x)" perhaps but we need to research this more.

I still think proper encoding is the answer...

(++ Ronnie!)

On 11/23/20 2:06 PM, Ronnie Flathers wrote:

Hey Jim! Thanks for the quick response. I agree - encoded parameters are the best way to go, but in the case I'm thinking of, the untrusted input has to be part of the URL itself, not a parameter. I don't think URL encoding can be relied on since servers will just decode it and canonicalize the path. Here's a quick test:

public static void main(String [] args) throwsException {

     String  bad=  "okay.txt/../../../../key.txt";
     URL  badURL=  new  URL("http://127.0.0.1:8000/a/b/c/"  +  Encode.forUriComponent(bad));
     System.out.println("Bad URL: "  +  badURL.toString());
     HttpURLConnection  con=  (HttpURLConnection) badURL.openConnection();
     con.setRequestMethod("GET");
     int  responseCode=  con.getResponseCode();
     System.out.println("GET Response Code :: "  +  responseCode);
     if  (responseCode==  HttpURLConnection.HTTP_OK) {// success
         BufferedReader  in=  new  BufferedReader(new  InputStreamReader(
                 con.getInputStream()));
         String  inputLine;
         StringBuffer  response=  new  StringBuffer();

         while  ((inputLine=  in.readLine())!=  null) {
             response.append(inputLine);
         }
         in.close();

         // print result
         System.out.println(response.toString());
     }else  {
         System.out.println("GET request not worked");
     }

 }

against a Python server, the file called "key.txt" in the root directory is accessed instead of files in |/a/b/c| as intended:

|Bad URL: http://127.0.0.1:8000/a/b/c/okay.txt%2F..%2F..%2F..%2F..%2Fkey.txt GET Response Code :: 200 sensitive |

Is there a different encoder that I should be using?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OWASP/owasp-java-encoder/issues/42#issuecomment-732496942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCLOIRZGILLGGUVM32LSRL2GXANCNFSM4UAEYCMA.

-- Jim Manico Manicode Security https://www.manicode.com

kwwall commented 3 years ago

It seems to me--in cases where it's possible--that validation, rather than output encoding is the more appropriate defense here. That might not be feasible if the URL that you are creating a path for is not your site (i.e., that of the ORIGIN server), but even then you probably can decide to at least use a block-list and deny "..". If it's your site, then use cannonicalization and then check to see if it corresponds to a legitimate resource. And for JavaEE, put your configuration data like "key.txt" either outside of Document Root or if you want it bundled in your .war or .ear file, make sure it is under WEB-INF/ so the JavaEE container itself will prevent direct browsing attempts to those resources.

ropnop commented 3 years ago

Thanks @jmanico - I don't have a .NET environment but I will check out the other resource.

The more I think about it, I'm inclined to agree with @kwwall - I'm just not sure if encoding will ever be enough in this case (without breaking things) - whatever custom encoding scheme you take you'd have to assume the server will be able to decode it, which would be impossible when you're constructing requests to a different site you don't control.

It's crazy how flexible URL paths can be. All of these curl requests are identical and retrieve key from the root of my directory:

curl --path-as-is --url "http://localhost:8000/a/b/c/../../../key"
curl --path-as-is --url "http://localhost:8000/a/b/c/okay%2F..%2F..%2F..%2F..%2Fkey"
curl --path-as-is --url "http://localhost:8000/a/b/c/%6f%6b%61%79%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%6b%65%79"

As for sanitization, I think two good approaches are:

I've implemented both as an example:

Bad Input: okay/../../../../key
Just filename: key
StripDotsAndSlashes: okaykey

Even with sanitization it's tricky since the user could supply input that is already URL encoded. In this case the sanitization function needs to properly URL Decode any input before stripping dots and slashes or extracting just a filename

Does it make sense to add any sanitization functions to this library for these cases?

ropnop commented 3 years ago

Here's what I was thinking to "sanitize" the inputs. It depends whether you want to honor paths (slashes) or not:

public static String normalizePath(String path) {
        path = URLDecoder.decode(path, Charset.defaultCharset());

        // make it an absolute path then normalize
        String normalized = Paths.get("/" + path).normalize().toString();

        // return a leading slash if it came in with one, otherwise strip the leading slash
        return path.charAt(0) == '/' ? normalized : normalized.substring(1);
    }

public static String stripPath(String path) {
        String decodedPath = URLDecoder.decode(path, Charset.defaultCharset());
        return Paths.get(decodedPath).getFileName().toString();
    }
jmanico commented 3 years ago

Our library is for XSS defense via encoding. These url validators seems like a case for a new project, like a safe url construction class....

I’ll clean up the docs as well to discourage anything other than parameter encoding.

But I’m with you and this opened my eyes to the challenges of path encoding and this is now a legit open issue for the library.

Let’s keep talking and researching this and figure out the best way to go.

-- Jim Manico @Manicode

On Nov 24, 2020, at 5:42 AM, Ronnie Flathers notifications@github.com wrote:

 Thanks @jmanico - I don't have a .NET environment but I will check out the other resource.

The more I think about it, I'm inclined to agree with @kwwall - I'm just not sure if encoding will ever be enough in this case (without breaking things) - whatever custom encoding scheme you take you'd have to assume the server will be able to decode it, which would be impossible when you're constructing requests to a different site you don't control.

It's crazy how flexible URL paths can be. All of these curl requests are identical and retrieve key from the root of my directory:

curl --path-as-is --url "http://localhost:8000/a/b/c/../../../key" curl --path-as-is --url "http://localhost:8000/a/b/c/okay%2F..%2F..%2F..%2F..%2Fkey" curl --path-as-is --url "http://localhost:8000/a/b/c/%6f%6b%61%79%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%6b%65%79" As for sanitization, I think two good approaches are:

just strip all .. and / from the string construct a path from the input and only return the filename portion I've implemented both as an example:

Bad Input: okay/../../../../key Just filename: key StripDotsAndSlashes: okaykey Even with sanitization it's tricky since the user could supply input that is already URL encoded. In this case the sanitization function needs to properly URL Decode any input before stripping dots and slashes or extracting just a filename

Does it make sense to add any sanitization functions to this library for these cases?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jmanico commented 3 years ago

I suggest a new project for specialized URL validation. Since we are just an encoding library I am closing this out.