codereading / sinatra

Classy web-development dressed in a DSL (official / canonical repo)
http://www.sinatrarb.com/
MIT License
12 stars 2 forks source link

Walkthrough #2

Open adamakhtar opened 12 years ago

adamakhtar commented 12 years ago

@codereading/readers Ok well it seems that last time in rack, using rack's lobster.ru example app and tracing the code was quite helpful. So I guess for this walk through it would seem great if everyone used something similiar.

How about the one from the docs?

# myapp.rb
require 'sinatra'

get '/' do
  'Hello world!'
end
gwynforthewyn commented 12 years ago

get is defined in base.rb. There's a thread where someone noted this is a monolithic library of stuff that sinatra can do, and that looks to me like an accurate observation.

get is on line 1166:

      # Defining a `GET` handler also automatically defines
      # a `HEAD` handler.
      def get(path, opts={}, &block)
        conditions = @conditions.dup
        route('GET', path, opts, &block)

        @conditions = conditions
        route('HEAD', path, opts, &block)
      end

So it takes a path, optional options and a block.

It then makes a local copy of @conditions, and dispatches to the route() method handing off the constant string 'GET', the path that was handed in, any options and the block.

@conditions is then set to be the local conditions variable: I assume from this that the route() method alters the conditions somehow. There's a second call to route() with 'HEAD' as the path.

Looks like the first piece of magic is happening in the dispatch to route(), which I'll look at after work.

gwynforthewyn commented 12 years ago

First things first: reading through base.rb, Sinatra isn't actually monolithic. It uses one file, but inside that file is a bunch of Modules and within the Modules there're a bunch of classes.

route() is defined as:

     def route(verb, path, options={}, &block)
        # Because of self.options.host
        host_name(options.delete(:host)) if options.key?(:host)
        enable :empty_path_info if path == "" and empty_path_info.nil?
        signature = compile!(verb, path, block, options)
        (@routes[verb] ||= []) << signature
        invoke_hook(:route_added, verb, path, block)
        signature
      end

First thing I noticed is that @conditions isn't mentioned. I wanted to see what was going with @conditions, so I grepped the source for its use. It's assigned to in the compile! function that route calls.

Taking a second look at the get method, what it does with @conditions is simple: it creates a local dupe of conditions, does some stuff that edits @conditions, then restores @conditions to its pre-edited state. I don't know why, but that's what it does.

Anyway, route(). It looks like it's important, as it is used to define all of the HTTP operations.

First, if a host name has been defined it is deleted from an options() hash. The example doesn't assign a hostname, so this line isn't used by the sample.

Second, we enable empty_path_info if path is an empty string or nil. We know that when we call get() in Simple.rb that we hand in a path, so we shouldn't enable empty_path_info, but let's verify that enable does nothing. The enable method is pretty simple:

   # Same as calling `set :option, true` for each of the given options.
      def enable(*opts)
        opts.each { |key| set(key, true) }
      end

I can see a couple of ways to see if I've done anything in path that requires an enable(). First, I need some working code.

The simple.rb example file requires using a relative path, so it's requiring the version of sinatra.rb that's in the git repo. That's good, because it means that whenever I run simple.rb, it's in fact running the code that I've downloaded, and not any installed gems I might have.

Great!

In the code in git, I've altered enable to read:

      def enable(*opts)
        puts "#{opts}"
        opts.each { |key| set(key, true) }
      end

then I run Simple.rb:

wendigo:examples jam$ ruby simple.rb 
[:inline_templates]
[2012-06-12 19:56:31] INFO  WEBrick 1.3.1
[2012-06-12 19:56:31] INFO  ruby 1.9.3 (2012-04-20) [x86_64-darwin11.3.0]
== Sinatra/1.3.2 has taken the stage on 4567 for development with backup from WEBrick
[2012-06-12 19:56:31] INFO  WEBrick::HTTPServer#start: pid=2941 port=4567

So, that [:inline_templates] is the output of the puts statement I put in enable. Doesn't look like we do bugger all with the path, so we know the code's working as expected.

So, next line:

signature = compile!(verb, path, block, options)

verb is 'GET'. path is the argument we handed into get() in simple.rb. "/". There's no options and no block.

My instinct is that compile will behave like a compiler and generate some data that is suitable for consumption by something else.

compile! reads:

def compile!(verb, path, block, options = {})
        options.each_pair { |option, args| send(option, *args) }
        method_name             = "#{verb} #{path}"
        unbound_method          = generate_method(method_name, &block)
        pattern, keys           = compile path
        conditions, @conditions = @conditions, []

        [ pattern, keys, conditions, block.arity != 0 ?
            proc { |a,p| unbound_method.bind(a).call(*p) } :
            proc { |a,p| unbound_method.bind(a).call } ]
      end

We don't have an options, so the first line isn't doing anything.

method_name becomes "GET /". If you don't know anything about HTTP servers, this doesn't seem special. If you do, this is pretty much what is expected. If you've not done this before, you should connect to an HTTP server and issue the command "GET /".

What HTTP server can you connect to? How about...github.com! How can we connect? The simplest way is probably to use netcat.

wendigo:examples jam$ nc github.com 80

At this point you'll be taken to a newline. Type in:

GET /

and hit enter. You'll receive:

GET /
<html>
<head><title>403 Forbidden</title></head>
<body bgcolor="white">
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx/1.0.13</center>
</body>
</html>

Awesome! nginx responded! Doing the same thing to google.com will give you their homepage. GET / is a command to a web server to return whatever lives at the root of their http tree on the file system, so seeing it here from tracing simple.rb is neat.


Anyway, next we generate a method using the block and the string we just created! I'm going to assume that generate_method uses reflection to create a method accessible using the method string we created. If I were doing my job and just trying to see what Sinatra does, I'd move on there, but for a learning exercise let's take a look further.

The code reads:

      def generate_method(method_name, &block)
        define_method(method_name, &block)
        method = instance_method method_name
        remove_method method_name
        method
      end

So, we define_method first. http://ruby-doc.org/core-1.9.3/Module.html documents define_method. The rest is a bit of metaprogramming bookkeeping it looks, I don't fully understand it. The class that this is in inherits from self: if someone knows what that's doing, I'd appreciate the knowledge.

method = instance_method method_name

more metaprogramming: http://ruby-doc.org/core-1.9.3/Module.html#method-i-instance_method. I should pull out my metaprogramming ruby copy and work this out, because it looks cool. The effect is to create an instance of an UnboundMethod (http://www.ruby-doc.org/core-1.9.3/UnboundMethod.html) . These are really cool things! This is a method which is attached to a variable. It then has to be bound (looks like the "bind" method is used for this, so it may be we'll see that coming up soon) to an object before the object can use the method.

The next few lines in generate_method do this:

remove_method method_name
method
end

So there's a bit of metaprogramming bookeeping to make sure we don't leave a method sitting around in a class, then the method is returned by generate() back up to compile! where it is assigned to the unbound_method variable. Check out how the name is exactly what we would expect!

If hte metaprogramming bookkeeping is necessary, then the generate() method is following the pattern called "passing a block". Ruby uses blocks internally a lot; Uncle Bob did a talk recently about block passing.

Whew! We've gotten to the bottom of a stack of methods here! I'm going to bugger off for an hour before I continue with this.

gwynforthewyn commented 12 years ago

Well, this is depressing. I had written up a lot of stuff beyond the above, and now it's all been eaten by something. I'll come back to this later: I had written up to the generate method in compile. If you can write up something up to there, that'd be awesome. If not, I'll come back and give a summary later.

gwynforthewyn commented 12 years ago

Oh wow! I noticed I unbalanced the markdown above. Whoops...

So if you do the same get with google.com as the page, you'll get google's homepage. Get / returns whatever's at the root of the file structure that the web server is serving to end users.

The next piece of code to look at is:

unbound_method = generate_method(method_name, &block)

generate_method generates a method. What kind of method? an unbound_method, apparently. What does that mean?


      def generate_method(method_name, &block)
        define_method(method_name, &block)
        method = instance_method method_name
        remove_method method_name
        method
      end

http://ruby-doc.org/core-1.9.3/Module.html#method-i-define_method defines an instance method in the receiver. The receiver is class which inherits from self; I'm not sure what the purpose of this is, but I assume it creates an anonymous namespace or something else. I'll have to find out what.

In any case, the impact is that an instance_method is defined on the current object. After that, a variable has the results of "instance_method method_name" assigned to it. instance_method is also defined in Module http://ruby-doc.org/core-1.9.3/Module.html#method-i-instance_method -- It returns an UnboundMethod object representing a given instance method. Interesting! We've heard that name before.

What is an UnboundMethod? http://www.ruby-doc.org/core-1.9.3/UnboundMethod.html -- an UnboundMethod is a method which can only be called by an object that it is bound to using the bind() method (I assume that from the first example). So in generate(), we now have a method defined and assigned to the variable method.

The class method is them removed with remove_method (http://ruby-doc.org/core-1.9.3/Module.html#method-i-remove_method).

So, the author of sinatra wants to limit who can access this "get /" method that is defined, presumably, otherwise the author would have left it as a class method. Weird. I wonder why. Maybe just tidy bookkeeping.

Anyway, we're at the end of generate! It returns to unbound_method an object of class UnboundMethod that has as its body the block that was passed in at unbound_method = generate_method(method_name, &block).

No block was passed in, though, and block doesn't define a default. Weird. What am I misunderstanding? Presumably a parameter which requires a definition but doesn't receive one is defaulted to null or something?

More later...

gwynforthewyn commented 12 years ago

Okay, that block. Here's what I did:

def generate_method(method_name, &block)
        if defined?(block)
          puts "#{block.class}"
        else
          puts "not defined"
        end
        define_method(method_name, &block)
        method = instance_method method_name
        remove_method method_name
        method
      end

I ran simple.rb:

wendigo:examples james$ ruby simple.rb 
Proc
Proc
Proc
Proc
Proc
Proc
[2012-06-12 21:27:58] INFO  WEBrick 1.3.1
[2012-06-12 21:27:58] INFO  ruby 1.9.3 (2012-04-20) [x86_64-darwin11.3.0]
[2012-06-12 21:27:58] WARN  TCPServer Error: Address already in use - bind(2)
== Someone is already performing on port 4567!

So the block is always a Proc. What's in it? To find that out, I need to find where it was defined. Hell, I missed this the first time through.

Okay, where could I have missed it? The methods are part of a class, so presumably it's class data.

The callsite was Sinatra.rb, when sinatra/base was included. Unless it was when sinatra/main was included? Okay, now I'm doubting myself.

irb(main):001:0> require "sinatra"
require "sinatra"
Proc
Proc
Proc
Proc
=> true

So simply requiring Sinatra generates 4 proc objects!

When I require sinatra.rb, it requires base and main. What does that do? Well, we know it gets down to this piece of code somehow, so I'll jazz up generate_method a bit more:

      def generate_method(method_name, &block)
        if defined?(block)
          puts "method_name is #{method_name}}"
          puts "#{block.class}"
        else
          puts "not defined"
        end
        define_method(method_name, &block)
        method = instance_method method_name
        remove_method method_name
        method
      end
wendigo:examples james$ irb -I ../lib -I lib
Switch to inspect mode.
irb(main):001:0> require 'sinatra'
require 'sinatra'
method_name is ERROR (?-mix:)}
Proc
method_name is GET /__sinatra__/:image.png}
Proc
method_name is HEAD /__sinatra__/:image.png}
Proc
method_name is ERROR (?-mix:)}

What is going on here? Well, I'll come back to it tomorrow.

ericgj commented 12 years ago

The block passed from get -> compile! -> generate! is your code, what gets run when someone hits that URL. It's a bit tricky, but crucial, to trace what happens to this block both here at the class level, and when a request comes through and it gets dispatched to that block.

Starting with what compile! returns:

[ pattern, keys, conditions, block.arity != 0 ?
        proc { |a,p| unbound_method.bind(a).call(*p) } :
        proc { |a,p| unbound_method.bind(a).call } ]

That last element of the array, the proc, is what gets called when the request is matched up to the route. a will be the instance of your app, p are the concrete captures/splats as extracted from the request URL. (You can see where this happens way down in the bowels of routing). So essentially this unbound, floating method gets bound to your app, and then the method is called, with the captures/splats if any are expected.

What is interesting to consider is why this bit of functional programming -- and let's face it, the creation of an unbound method is pretty kludgy in ruby -- is needed here -- what it gains us.

Related to this question I think is the magic part of Sinatra's DSL, which always had me scratching my head how it worked.

On the one hand, if you look at the routes, clearly they run in the context of an instance of your app. request, response, etc. aren't scoped to Sinatra::Base. So on the surface it looks like a case of app.instance_eval(&route). But if that were the case, where would the block parameters come from?

On the other hand, since instead the routes must instead be used as true closures, i.e. invoked like route.call(*p), then how the hell are they run in the context of your app? Why wouldn't you have to inject your app into your routes like:

get '/foo/:id' do |app, id|
  #... do something with app.request and app.response, instead of just request and response
end

Binding the route onto the app as an anonymous method, seems to 'let you have your cake and eat it too'. And thinking about it, it's really the most straightforward approach given the DSL, even if it seems weird how you have to do it in ruby. (Incidentally, I thought I read somewhere that _why came up with or at least popularized this technique, is that true?)

Maybe there's other reasons for doing that way too? @rkh ?

EDIT:

Actually, a more straightforward way would be to simply define_method for each route and leave it attached (funky name and all), then store the compiled routes like

proc {|a, p| a.send( method_name, *p)}

So the question becomes what is the advantage of the anonymous method, attached to the app instance on the fly, versus attaching it to the class as a named instance method? Is it in order to avoid name collisions in the Sinatra::Base namespace?

adamakhtar commented 12 years ago

@jamandbees i just edited your first comment to balance the code blocks - hope you dont mind. Thanks very much for contributing so much. Im going to take a look at this and try and add some more tomorrow.

adamakhtar commented 12 years ago

good point about why the use of an unbound method. I think the authors would definately want to avoid naming conflicts but was this the primary reason for using this ... pattern?

I looked through the issue tracker and found this issue regarding the generate_method and it's use of unbound_methods.

There was a proposal to switch to instance_exec instead of using the define_method / remove_method approach but @rkh said that the latter method had significant speed improvements. He even provided evidence in a gist. He also said it didnt pollute the stack trace as much.

Also

Your pull request makes sense, though, and in fact, this is a relict from when we used to support 1.8.6, but for the above reasons I'm actually in favor of keeping it that way.

Im not a pro rubyist so not sure if this helps the conversation, perhaps it might be useful for a more able coder.

ghost commented 12 years ago

actually i was wondering the same thing, and I'm fair to middlin' with ruby. Thanks for doing the digging on that and explaining the differences (especially the speed aspect)

I'd not run into any of the 3 mentioned methods before, so this helps clear things a bit.

gwynforthewyn commented 12 years ago

@ericgj Thanks for the pointer! I re-read the example simple.rb and saw that it declares this:

get('/') { 'this is a simple app' }

So that right there is the block that's been handed in; my initial analysis was incorrect, as there clearly is a block there.

gwynforthewyn commented 12 years ago

To clear some things up for myself:

  1. @conditions is preserved because an HTTP GET is meant to have no side-effects.
  2. condition is edited earlier than I thought, if there is a host name present in the options hash.
      # Condition for matching host name. Parameter might be String or Regexp.
      def host_name(pattern)
        condition { pattern === request.host }
      end

This dispatches to the condition function, with a block as an argument:

      # Add a route condition. The route is considered non-matching when the
      # block returns false.
      def condition(name = "#{caller.first[/`.*'/]} condition", &block)
        @conditions << generate_method(name, &block)
      end

@conditions is edited here: this is why @conditions is preserved earlier on. What is a condition? http://www.sinatrarb.com/intro#Conditions

This is interesting knowledge, though it doesn't impact simple.rb.

gwynforthewyn commented 12 years ago

I've been wondering what the use cases are for an unbound method. A couple of articles:

http://docs.python.org/release/2.3.4/lib/typesmethods.html http://www.quora.com/Ruby-programming-language/What-are-some-practical-uses-of-Unbound-Methods-in-Ruby

http://stackoverflow.com/questions/928443/ruby-functions-vs-methods is really interesting.

I whipped out my copy of Metaprogramming Ruby, and it says 2 things about UnboundMethod objects: "good luck finding a place to use these esoteric features" The difference between an UnboundMethod and a lambda is that the UnboundMethod is evaluated in the scope of an object, whereas a lambda is evaluated in the scope it's defined in.


I think that this is using an UnboundMethod as a way of allowing each object ot have a different implementation of get() without needing to actually write the implementation for each possible way of writing a get. In a way, it's a function pointer.

jarodzz commented 12 years ago

I still have problems understanding "generate_method" and compile. I see what the code is doing, but fails to get it why.

Following jamandbees's advice, i picked up the book "meta programming with ruby". it's a great book, explains al-most all my puzzles about meta programming, and i have no difficulities to understand.

though i can't contribute anything to the thread now, i recommend every one read this book.

gwynforthewyn commented 12 years ago

The last five or six days, I've been buzzing in my mind with this idea of having objects that are the same structurally but their internal implementation of a specific set of methods is different. It's really kind of cool: it's a very clean way of implementing the strategy pattern: http://en.wikipedia.org/wiki/Strategy_pattern . The brief precis of a strategy is that you have some way of changing out an algorithm inside a method, so that you can choose an optimal algorithm for a specific need.

gwynforthewyn commented 12 years ago

I was looking through the code and had reached compile!. I think we have a pretty decent handle on what generate_method is doing, so what's next?

pattern, keys           = compile path

that dispatches us to the compile(path) method:

def compile(path)
        keys = []
        if path.respond_to? :to_str
          pattern = path.to_str.gsub(/[^\?\%\\\/\:\*\w]/) { |c| encoded(c) }
          pattern.gsub!(/((:\w+)|\*)/) do |match|
            if match == "*"
              keys << 'splat'
              "(.*?)"
            else
              keys << $2[1..-1]
              "([^/?#]+)"
            end
          end
          [/^#{pattern}$/, keys]
        elsif path.respond_to?(:keys) && path.respond_to?(:match)
          [path, path.keys]
        elsif path.respond_to?(:names) && path.respond_to?(:match)
          [path, path.names]
        elsif path.respond_to? :match
          [path, keys]
        else
          raise TypeError, path
        end
      end

path is "/".

path is a string, so presumably it responds to to_str. Let's verify that:

[5] pry(main)> "string".respond_to? :to_str
=> true

The call to gsub returns an enumerator:

[8] pry(main)> "/".to_str.gsub(/[^\?\%\\\/\:\*\w]/)
=> #<Enumerator: ...>

The enumerator appears to be some rubyism that we can think of as the matched portion of the string. This is then handed to a the encode() method via a block. What does encode do?

      def encoded(char)
        enc = URI.encode(char)
        enc = "(?:#{Regexp.escape enc}|#{URI.encode char, /./})" if enc == char
        enc = "(?:#{enc}|#{encoded('+')})" if char == " "
        enc
      end

It encodes the string as a URI and returns it! Nice! So we're now getting further into normal webbishness!

The next line is an attempt to match against a regex:

pattern.gsub!(/((:\w+)|\*)/) do |match|

A colon, followed by any number of word characters, OR a splat. I don't think this actually matches, so we return:

   [/^#{pattern}$/, keys]

pattern, with an empty keys array. I'm uncertain about whether keys actually is blank. I don't think I'm wrong about the regex, but I would appreciate someone else's eyes on it.

gwynforthewyn commented 12 years ago

Back to compile!, pattern and keys have now been set.

A local copy of conditions is set to @conditions next, with @conditions being blanked (remember: on a get, @conditions has been duplicated a long time ago, so this doesn't matter as far as side effects go).

conditions, @conditions = @conditions, []

And now we reach the area @ericgj has already commented upon:


        [ pattern, keys, conditions, block.arity != 0 ?
            proc { |a,p| unbound_method.bind(a).call(*p) } :
            proc { |a,p| unbound_method.bind(a).call } ]

This is a bit of a bugger to read. I can see that it returns an array.

This clears things up:


[40] pry(main)> [ 1, 0 != 0 ? "true arrives here" : "false arrives here"]
=> [1, "false arrives here"]

The array that's returned contains [1, (if 0!=0 then the second parameter is "false arrives here" otherwise it is "true arrives here")].