codereading / cuba

Rum based microframework for web development.
http://cuba.is
MIT License
6 stars 2 forks source link

Can anyone explain the regexps in #match and #consume #7

Open adamakhtar opened 12 years ago

adamakhtar commented 12 years ago

@codereading/readers

def consume(pattern)
    return unless match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)
    binding.pry
    path, *vars = match.captures

    env["SCRIPT_NAME"] += "/#{path}"
    env["PATH_INFO"] = "#{vars.pop}#{match.post_match}"

    captures.push(*vars)
  end
  private :consume

  def match(matcher, segment = "([^\\/]+)")
    case matcher
    when String then 
      consume(matcher.gsub(/:\w+/, segment))
    when Regexp then consume(matcher)
    when Symbol then consume(segment)
    when Proc   then matcher.call
    else
      matcher
    end
  end

consume(matcher.gsub(/:\w+/, segment))

and

match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)

just what exactly are these two regexps doing?

adamakhtar commented 12 years ago

Ok got it.

When we define a route with the dsl such as

on "posts/:post_id/comment/:id" do |post_id, id| 
...

Cuba needs a way to compare that with every request that comes in to see if the actual path matches our route.

The most obvious way would be to use regexps such as in this psedo code

if actual_request_path.match (our_dsl_defintion) then return result

But as our DSL definition stands it wont work as a regexp. So Cuba needs to interpret our rule and convert it into a regex.

This happens in #match

matcher.gsub(/:\w+/, segment)

where segment is a parameter with a default value of "([^\/]+)"

and matcher is our dsl rule "posts/:post_id/comment/:id"

the result of the gsub is "posts/([^\\/]+)/comment/([^\\/]+)"

and that is the regexp that Cuba will use to check against the response path.

The ([^\/]+) regex simply means

match any non / or \ characters once or more times i.e. the 56 in posts/56 would match this.

adamakhtar commented 12 years ago

Consume

Now in #consume Cuba uses the previously constructed regex to check if it matches the given path.

return unless match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)

Now whilst I can understand what the above is doing I dont know how it is doing it.

Can any @codereading/readers help with this regex?

I understand pattern part - thats our regexp from before. The last part

((?:\/|\z))

is difficult to understand.

Why is it in double brackets?

And why what is this checking for

(/\A\/ at the beginning?

ericgj commented 12 years ago

\A and \z match the beginning and end of the string respectively. I didn't know about the ?: construct before, but according to the docs,

The (?:…) construct provides grouping without capturing.

Useful. But I'm not sure the advantage here since the double-parens mean that it does capture the closing forward-slash or end-of-line. So I don't see any difference between this and /\A\/(#{pattern})(\/|\z)/ Maybe it's for performance reasons.

theldoria commented 12 years ago

I don't think /\A\/(#{pattern})((?:\/|\z))/ is better performing than /\A\/(#{pattern})(\/|\z)/, and I could not find any other evidence while running a simple benchmark (see https://gist.github.com/3223768):

posts/0/comment/1 -- 
posts/0/comment/1 -- /
posts/0/comment/1 -- /x
Rehearsal --------------------------------------------
regexp_a   5.562000   0.110000   5.672000 (  5.687500)
----------------------------------- total: 5.672000sec

               user     system      total        real
regexp_a   5.594000   0.109000   5.703000 (  5.937500)
Rehearsal --------------------------------------------
regexp_b   5.594000   0.062000   5.656000 (  5.671875)
----------------------------------- total: 5.656000sec

               user     system      total        real
regexp_b   5.578000   0.032000   5.610000 (  5.609375)
Rehearsal --------------------------------------------
regexp_c   5.547000   0.093000   5.640000 (  6.046875)
----------------------------------- total: 5.640000sec

               user     system      total        real
regexp_c   5.641000   0.047000   5.688000 (  5.687500)

You may note that I tried a third regexp as well, because I guess the expression should not only match / but also \ at the end.

By the way, you may find the regexp idiom ((?:x|y)+) when a repeated group should be captured. For example ((?:ab|cd)+) matches abcd and captures abcd, while (ab|cd)+ also matches abcd, but captures only cd.

cyx commented 12 years ago

Very nice catch guys! I think @soveran has already pushed the revision (07d77d4c9c85d17db29fefda3c268805616d062b).

@codereading == win! :-)

adamakhtar commented 12 years ago

@ericgj and @theldoria sorry for the late reply. thanks very much for explaining that and great that it contributed to a revision.