Closed ropnop closed 3 years ago
A better approach is to URL encode parameters which is a decades old encoding method.
We give examples of this in detail in our project docs.
Hit the "how to use" link https://owasp.org/www-project-java-encoder/
With respect, no PR needed. This is the way of the encoder.
Aloha. Jim
On 11/23/20 1:48 PM, Ronnie Flathers wrote:
Hello! Wanted to start a discussion around adding a function that makes it safe to append strings to URLs in Java. I haven't found a great way to do this, and if we can agree on a good approach I'd be happy to submit a PR.
Imagine the following code:
URL u= new URL("http://example.com/api/v2/" + untrustedInput+ "/data"); HttpURLConnection con= (HttpURLConnection) u.openConnection(); ...etc...
If |unrtustedInput| comes in as |foo../../../bar| what method can make it safe and not allow for path traversal? |encode.forUriComponent| does not work here because Java URL will just decode the value.
The only good approach I've found is to just strip out all the dots and slashes, as seen here: https://github.com/linkedin/URL-Detector/blob/173c6dbb5d3453d8327b59f254905cdf0338a7de/url-detector/src/main/java/com/linkedin/urls/PathNormalizer.java#L40 https://github.com/linkedin/URL-Detector/blob/173c6dbb5d3453d8327b59f254905cdf0338a7de/url-detector/src/main/java/com/linkedin/urls/PathNormalizer.java#L40
Does this sound like something that should belong in owasp-java-encoder? Any other ideas for a sanitization/encoding function that would work?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OWASP/owasp-java-encoder/issues/42, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCMXGYB2JSWIOWMPBH3SRLYDXANCNFSM4UAEYCMA.
-- Jim Manico Manicode Security https://www.manicode.com
Hey Jim! Thanks for the quick response. I agree - encoded parameters are the best way to go, but in the case I'm thinking of, the untrusted input has to be part of the URL itself, not a parameter. I don't think URL encoding can be relied on since servers will just decode it and canonicalize the path. Here's a quick test:
public static void main(String [] args) throws Exception {
String bad = "okay.txt/../../../../key.txt";
URL badURL = new URL("http://127.0.0.1:8000/a/b/c/" + Encode.forUriComponent(bad));
System.out.println("Bad URL: " + badURL.toString());
HttpURLConnection con = (HttpURLConnection) badURL.openConnection();
con.setRequestMethod("GET");
int responseCode = con.getResponseCode();
System.out.println("GET Response Code :: " + responseCode);
if (responseCode == HttpURLConnection.HTTP_OK) { // success
BufferedReader in = new BufferedReader(new InputStreamReader(
con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
// print result
System.out.println(response.toString());
} else {
System.out.println("GET request not worked");
}
}
against a Python server, the file called "key.txt" in the root directory is accessed instead of files in /a/b/c
as intended:
Bad URL: http://127.0.0.1:8000/a/b/c/okay.txt%2F..%2F..%2F..%2F..%2Fkey.txt
GET Response Code :: 200
sensitive
Is there a different encoder that I should be using?
Let me take a careful look at this with Jeff Ichnowski (the original author). This might be a worthy bug report. I'll get back to you asap.
On 11/23/20 2:06 PM, Ronnie Flathers wrote:
Hey Jim! Thanks for the quick response. I agree - encoded parameters are the best way to go, but in the case I'm thinking of, the untrusted input has to be part of the URL itself, not a parameter. I don't think URL encoding can be relied on since servers will just decode it and canonicalize the path. Here's a quick test:
public static void main(String [] args) throwsException {
String bad= "okay.txt/../../../../key.txt"; URL badURL= new URL("http://127.0.0.1:8000/a/b/c/" + Encode.forUriComponent(bad)); System.out.println("Bad URL: " + badURL.toString()); HttpURLConnection con= (HttpURLConnection) badURL.openConnection(); con.setRequestMethod("GET"); int responseCode= con.getResponseCode(); System.out.println("GET Response Code :: " + responseCode); if (responseCode== HttpURLConnection.HTTP_OK) {// success BufferedReader in= new BufferedReader(new InputStreamReader( con.getInputStream())); String inputLine; StringBuffer response= new StringBuffer(); while ((inputLine= in.readLine())!= null) { response.append(inputLine); } in.close(); // print result System.out.println(response.toString()); }else { System.out.println("GET request not worked"); } }
against a Python server, the file called "key.txt" in the root directory is accessed instead of files in |/a/b/c| as intended:
|Bad URL: http://127.0.0.1:8000/a/b/c/okay.txt%2F..%2F..%2F..%2F..%2Fkey.txt GET Response Code :: 200 sensitive |
Is there a different encoder that I should be using?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OWASP/owasp-java-encoder/issues/42#issuecomment-732496942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCLOIRZGILLGGUVM32LSRL2GXANCNFSM4UAEYCMA.
-- Jim Manico Manicode Security https://www.manicode.com
Ronnie,
.NET has two functions for URL encoding.
https://docs.microsoft.com/en-us/dotnet/api/system.web.httputility.urlpathencode?view=net-5.0 for a path fragment like your example below.
And the typical
https://docs.microsoft.com/en-us/dotnet/api/system.web.httputility.urlencode?view=net-5.0 for traditional URL encoding
The docs I'm reading say (maybe) this level of encoding nuance is what is really needed to stop problems like you described below.
Ideas...
1) If you have access to a .NET dev setup, can you see if you run into the same problem using UrlPathEncode?
and
2) Check out the research at https://www.baeldung.com/java-url-encoding-decoding section 5 "5. Encode a Path Segment" and see if that helps.
I think we need a new URL encoding function called "Encoder.forUriPath(Stirng x)" perhaps but we need to research this more.
I still think proper encoding is the answer...
(++ Ronnie!)
On 11/23/20 2:06 PM, Ronnie Flathers wrote:
Hey Jim! Thanks for the quick response. I agree - encoded parameters are the best way to go, but in the case I'm thinking of, the untrusted input has to be part of the URL itself, not a parameter. I don't think URL encoding can be relied on since servers will just decode it and canonicalize the path. Here's a quick test:
public static void main(String [] args) throwsException {
String bad= "okay.txt/../../../../key.txt"; URL badURL= new URL("http://127.0.0.1:8000/a/b/c/" + Encode.forUriComponent(bad)); System.out.println("Bad URL: " + badURL.toString()); HttpURLConnection con= (HttpURLConnection) badURL.openConnection(); con.setRequestMethod("GET"); int responseCode= con.getResponseCode(); System.out.println("GET Response Code :: " + responseCode); if (responseCode== HttpURLConnection.HTTP_OK) {// success BufferedReader in= new BufferedReader(new InputStreamReader( con.getInputStream())); String inputLine; StringBuffer response= new StringBuffer(); while ((inputLine= in.readLine())!= null) { response.append(inputLine); } in.close(); // print result System.out.println(response.toString()); }else { System.out.println("GET request not worked"); } }
against a Python server, the file called "key.txt" in the root directory is accessed instead of files in |/a/b/c| as intended:
|Bad URL: http://127.0.0.1:8000/a/b/c/okay.txt%2F..%2F..%2F..%2F..%2Fkey.txt GET Response Code :: 200 sensitive |
Is there a different encoder that I should be using?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OWASP/owasp-java-encoder/issues/42#issuecomment-732496942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBYCLOIRZGILLGGUVM32LSRL2GXANCNFSM4UAEYCMA.
-- Jim Manico Manicode Security https://www.manicode.com
It seems to me--in cases where it's possible--that validation, rather than output encoding is the more appropriate defense here. That might not be feasible if the URL that you are creating a path for is not your site (i.e., that of the ORIGIN server), but even then you probably can decide to at least use a block-list and deny "..". If it's your site, then use cannonicalization and then check to see if it corresponds to a legitimate resource. And for JavaEE, put your configuration data like "key.txt" either outside of Document Root or if you want it bundled in your .war or .ear file, make sure it is under WEB-INF/ so the JavaEE container itself will prevent direct browsing attempts to those resources.
Thanks @jmanico - I don't have a .NET environment but I will check out the other resource.
The more I think about it, I'm inclined to agree with @kwwall - I'm just not sure if encoding will ever be enough in this case (without breaking things) - whatever custom encoding scheme you take you'd have to assume the server will be able to decode it, which would be impossible when you're constructing requests to a different site you don't control.
It's crazy how flexible URL paths can be. All of these curl requests are identical and retrieve key
from the root of my directory:
curl --path-as-is --url "http://localhost:8000/a/b/c/../../../key"
curl --path-as-is --url "http://localhost:8000/a/b/c/okay%2F..%2F..%2F..%2F..%2Fkey"
curl --path-as-is --url "http://localhost:8000/a/b/c/%6f%6b%61%79%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%6b%65%79"
As for sanitization, I think two good approaches are:
..
and /
from the stringI've implemented both as an example:
Bad Input: okay/../../../../key
Just filename: key
StripDotsAndSlashes: okaykey
Even with sanitization it's tricky since the user could supply input that is already URL encoded. In this case the sanitization function needs to properly URL Decode any input before stripping dots and slashes or extracting just a filename
Does it make sense to add any sanitization functions to this library for these cases?
Here's what I was thinking to "sanitize" the inputs. It depends whether you want to honor paths (slashes) or not:
public static String normalizePath(String path) {
path = URLDecoder.decode(path, Charset.defaultCharset());
// make it an absolute path then normalize
String normalized = Paths.get("/" + path).normalize().toString();
// return a leading slash if it came in with one, otherwise strip the leading slash
return path.charAt(0) == '/' ? normalized : normalized.substring(1);
}
public static String stripPath(String path) {
String decodedPath = URLDecoder.decode(path, Charset.defaultCharset());
return Paths.get(decodedPath).getFileName().toString();
}
Our library is for XSS defense via encoding. These url validators seems like a case for a new project, like a safe url construction class....
I’ll clean up the docs as well to discourage anything other than parameter encoding.
But I’m with you and this opened my eyes to the challenges of path encoding and this is now a legit open issue for the library.
Let’s keep talking and researching this and figure out the best way to go.
-- Jim Manico @Manicode
On Nov 24, 2020, at 5:42 AM, Ronnie Flathers notifications@github.com wrote:
Thanks @jmanico - I don't have a .NET environment but I will check out the other resource.
The more I think about it, I'm inclined to agree with @kwwall - I'm just not sure if encoding will ever be enough in this case (without breaking things) - whatever custom encoding scheme you take you'd have to assume the server will be able to decode it, which would be impossible when you're constructing requests to a different site you don't control.
It's crazy how flexible URL paths can be. All of these curl requests are identical and retrieve key from the root of my directory:
curl --path-as-is --url "http://localhost:8000/a/b/c/../../../key" curl --path-as-is --url "http://localhost:8000/a/b/c/okay%2F..%2F..%2F..%2F..%2Fkey" curl --path-as-is --url "http://localhost:8000/a/b/c/%6f%6b%61%79%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%6b%65%79" As for sanitization, I think two good approaches are:
just strip all .. and / from the string construct a path from the input and only return the filename portion I've implemented both as an example:
Bad Input: okay/../../../../key Just filename: key StripDotsAndSlashes: okaykey Even with sanitization it's tricky since the user could supply input that is already URL encoded. In this case the sanitization function needs to properly URL Decode any input before stripping dots and slashes or extracting just a filename
Does it make sense to add any sanitization functions to this library for these cases?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
I suggest a new project for specialized URL validation. Since we are just an encoding library I am closing this out.
Hello! Wanted to start a discussion around adding a function that makes it safe to append strings to URLs in Java. I haven't found a great way to do this, and if we can agree on a good approach I'd be happy to submit a PR.
Imagine the following code:
If
unrtustedInput
comes in asfoo../../../bar
what method can make it safe and not allow for path traversal?encode.forUriComponent
does not work here because Java URL will just decode the value.The only good approach I've found is to just strip out all the dots and slashes, as seen here: https://github.com/linkedin/URL-Detector/blob/173c6dbb5d3453d8327b59f254905cdf0338a7de/url-detector/src/main/java/com/linkedin/urls/PathNormalizer.java#L40
Does this sound like something that should belong in owasp-java-encoder? Any other ideas for a sanitization/encoding function that would work?