Closed GoogleCodeExporter closed 8 years ago
Original comment by manico.james@gmail.com
on 1 Nov 2010 at 12:55
We should tackle this is 2 ways:
1) Validate URL's with the java.net.Url class
2) Consider limited amount of time that RegEx's are allowed to execute
(from Sebastian)
I see,
so you are thinking along the lines of this I suppose:
http://gist.github.com/630969
There is no performance penalty if you execute it with the
SingleThreadExecutor.
If you use the default FixedThreadPool, it gets as good
as Brian Goetz got it.
-Sebastian Kübeck
Original comment by manico.james@gmail.com
on 6 Nov 2010 at 7:45
Hi Jim.
The problematic part of the Regex is:
([-.\\w]*[0-9a-zA-Z])*
The problem is that “w” appears in both parts [-.\\w] and [0-9a-zA-Z], so
again we get an equivalent to (a*a)*.
About a solution, it’s less clear.
If I understand correctly, what you want is something that might(!) start with
“.”,”\”,”-“, then has a series of letter/number, then again one of
the “.”, “\”, “-“ then again letters/numbers and so on, and must
end with a number/letter (one or more). For example:
“b---w-ww-w-w.-w-w-w-d-w-ww\\\w-w-w.-w-w-w-w-w-w-5---www-wsasaS”. I don’t
understand why, but this looks like what it finds.
The following Regex SHOULD do the job. I’m not 100% sure about it, but it
looks good to me:
([-.\\w]?([0-9a-zA-Z]+[-.\\]+)*[0-9a-zA-Z]+)?
So you get a full Regex that looks like this:
^(ht|f)tp(s?)\\:\\/\\/[0-9a-zA-Z]([-.\\w]?([0-9a-zA-Z]+[-.\\]+)*[0-9a-zA-Z]+)?(:
(0-9)*)*(\\/?)([a-zA-Z0-9\\-\\.\\?\\,\\:\\'\\/\\\\\\+=&%\\$#_]*)?$
You should better check it for as many examples you have, because I might have
missed something, but it looks like it is a ReDoS-free equivalent Regex..
Good luck!
Adar.
Original comment by manico.james@gmail.com
on 25 Nov 2010 at 12:04
Just a thought – what if you just remove the “w”:
([-.\\]*[0-9a-zA-Z])*
It’s not clear to me what the “w” was there for, actually, but maybe
it’s only because midnight has just passed over an hour ago .
Cheers,
Adar.
Original comment by manico.james@gmail.com
on 25 Nov 2010 at 12:17
I think perhaps there are two things going on here. First is the \w inside the
character set. This seems to be a wrong attempt to include alphanumeric
characters. The second thing is the double escape \\ syntax that you have to
use with Java. The real regex (as seen by Java) is:
^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\
.\?\,\:\'\/\\\+=&%\$#_]*)?$
I suspect that the tool was using the escaped version which may have caused
misfires. Testing the regex against the provided attack does not seem to cause
a DOS problem on Java.
Original comment by planetlevel
on 26 Jan 2011 at 9:45
Don't know about you, but I consider DoS as security-related. Also added
component.
Original comment by kevin.w.wall@gmail.com
on 12 Feb 2011 at 8:38
This is not a real bug, multiple tools confirmed Jeff was right. Closing this
out.
Original comment by manico.james@gmail.com
on 17 Feb 2011 at 3:18
Original issue reported on code.google.com by
augu...@gmail.com
on 15 Oct 2010 at 4:59