PerlDancer / Dancer2

Perl Dancer Next Generation (rewrite of Perl Dancer)
http://perldancer.org/
Other
546 stars 274 forks source link

Regex metachars have special meaning in string routes #1277

Open pdl opened 8 years ago

pdl commented 8 years ago

I am surprised to find that the following are equivalent:

get  q{/foo/(\<id>\d+)} => sub{...};
get qr{/foo/(\<id>\d+)} => sub{...};

Fortunately, it looks like . does not have it's regular expression meaning so e.g. 'foo.json' won't match fooljson.

However, there may be other metacharacters that cause problems. Ordinary brackets are legitimate in urls, so:

get q{/a(b)} => sub{...}; # does not match /a(b)
get q{/a\(b\)} => sub{...}; # matches /a(b)

While brackets are rare in urls, this might be surprising for people who have arbitrary text in their routes, e.g. see http://advent.perldancer.org/2014/23 - which suggests at one point interpolating values from a db into the route.

Is this an intentional feature?

If so, is it documented anywhere?

The closest thing I can find in the docs is the regex-in-string-form matching the user agent in the example at https://metacpan.org/pod/distribution/Dancer2/lib/Dancer2/Manual.pod#Conditional-Matching - and the rules for this are not explained.

xsawyerx commented 8 years ago

[...] Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

All unsafe characters must always be encoded within a URL. [...]

-- RFC 1738 2.2: URL Character Encoding Issues.

This means that while we should allow ( and ), this does not extend to all characters used in regular expressions.

If we're currently not quoting strings correctly, we should write a test to cover how we think it should look like and fix it.