Open adamakhtar opened 12 years ago
Ok got it.
When we define a route with the dsl such as
on "posts/:post_id/comment/:id" do |post_id, id|
...
Cuba needs a way to compare that with every request that comes in to see if the actual path matches our route.
The most obvious way would be to use regexps such as in this psedo code
if actual_request_path.match (our_dsl_defintion) then return result
But as our DSL definition stands it wont work as a regexp. So Cuba needs to interpret our rule and convert it into a regex.
This happens in #match
matcher.gsub(/:\w+/, segment)
where segment is a parameter with a default value of "([^\/]+)"
and matcher is our dsl rule "posts/:post_id/comment/:id"
the result of the gsub is
"posts/([^\\/]+)/comment/([^\\/]+)"
and that is the regexp that Cuba will use to check against the response path.
The ([^\/]+) regex simply means
match any non / or \ characters once or more times i.e. the 56 in posts/56 would match this.
Now in #consume Cuba uses the previously constructed regex to check if it matches the given path.
return unless match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)
Now whilst I can understand what the above is doing I dont know how it is doing it.
Can any @codereading/readers help with this regex?
I understand pattern
part - thats our regexp from before. The last part
((?:\/|\z))
is difficult to understand.
Why is it in double brackets?
And why what is this checking for
(/\A\/
at the beginning?
\A
and \z
match the beginning and end of the string respectively. I didn't know about the ?:
construct before, but according to the docs,
The (?:…) construct provides grouping without capturing.
Useful. But I'm not sure the advantage here since the double-parens mean that it does capture the closing forward-slash or end-of-line. So I don't see any difference between this and /\A\/(#{pattern})(\/|\z)/
Maybe it's for performance reasons.
I don't think /\A\/(#{pattern})((?:\/|\z))/ is better performing than /\A\/(#{pattern})(\/|\z)/, and I could not find any other evidence while running a simple benchmark (see https://gist.github.com/3223768):
posts/0/comment/1 --
posts/0/comment/1 -- /
posts/0/comment/1 -- /x
Rehearsal --------------------------------------------
regexp_a 5.562000 0.110000 5.672000 ( 5.687500)
----------------------------------- total: 5.672000sec
user system total real
regexp_a 5.594000 0.109000 5.703000 ( 5.937500)
Rehearsal --------------------------------------------
regexp_b 5.594000 0.062000 5.656000 ( 5.671875)
----------------------------------- total: 5.656000sec
user system total real
regexp_b 5.578000 0.032000 5.610000 ( 5.609375)
Rehearsal --------------------------------------------
regexp_c 5.547000 0.093000 5.640000 ( 6.046875)
----------------------------------- total: 5.640000sec
user system total real
regexp_c 5.641000 0.047000 5.688000 ( 5.687500)
You may note that I tried a third regexp as well, because I guess the expression should not only match / but also \ at the end.
By the way, you may find the regexp idiom ((?:x|y)+) when a repeated group should be captured. For example ((?:ab|cd)+) matches abcd and captures abcd, while (ab|cd)+ also matches abcd, but captures only cd.
Very nice catch guys! I think @soveran has already pushed the revision (07d77d4c9c85d17db29fefda3c268805616d062b).
@codereading == win! :-)
@ericgj and @theldoria sorry for the late reply. thanks very much for explaining that and great that it contributed to a revision.
@codereading/readers
consume(matcher.gsub(/:\w+/, segment))
and
match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)
just what exactly are these two regexps doing?