Rack SPEC discussion: Upgrading using `env['upgrade.websocket']`

boazsegev commented 8 years ago

This is a discussion for the proposed upgrade workflow for Rack.

I will update this first post with the information about the outline for implementation. This first draft is based my experience implementing this feature (and using this feature) with Iodine.

The draft was MOVED here.

evanphx commented 8 years ago

I'd ask for a few adjustments:

Performing extend is a very expensive operation due to it's effect on method caches, so having the API depend on it is not the greatest approach. I'd prefer that on_open be passed an object that contains the write, uuid, and close methods.
The defer, each, and count methods could also be on this new object, but they don't feel like that's the right place for them. I'd prefer that there be an object in env["websocket.process"] that contains these methods. Then non-websocket requests can easily access it without the constant as well.
Could you give an example of how the uuid is used?

boazsegev commented 8 years ago

@evanphx , thank you for your input.

I'm on my way to the airport, so I'll be short. I'll also think as I write, so please excuse me.

Performing extend on a "per object" basis will be more expensive then performing extend on a Class, since method caches should be updated only on the first extend, or am I wrong?

I assume it would be totally safe to extend the underlying class, which will be performed only once, rather then extend an instance object.

On newer versions of Ruby (I looked at 2.3.x, but I assume 2.2.x too), the extend is basically ignored if the object already is_a?, or am I wrong?
passing a "websocket" object to on_open is definitely an alternative worth exploring. This will raise two issues that worry me:
- One is a small thing. This approach means that on_open would become required rather then optional.
- Two is a bigger thing. This will make inheritance less obvious when trying to implement a framework using this specification (it will mean that super will have to be the first function call rather then the last and that super will be required)...
...I guess I can see things going wrong when trying to build more complex systems with this requirement in place. I can imagine how Rails will try to circumvent the inheritance requirement and these thoughts are a bit worrying... What do you think?
uuid would probably be used within a defer to allow for unicasting (sending a message to a specific websocket). i.e.

defer(user.friends[0].uuid) {|friend| friend.write "Hey!" }

This might be a "funny" use for defer, but it is similar to how I use it in the experimental plezi.io framework when perfuming a unicast - (here's the code)[https://github.com/boazsegev/plezi/blob/b1ebe0ea1adaed7a6865f112a761010a5263344d/lib/plezi/websockets/message_dispatch.rb#L53].

It's not the only way to use these features, but it does allow a direct message instead of iterating using each.

I also use it to identify connections across servers / processes, by connecting the uuid with a process identifier. This allows me to use the process's UUID as a private pub/sub channel, so messages don't have to go though all the server instances when two websockets are communicating using the server.

Having the uuid be process specific allows for a "counter" based solution that requires less resources then a randomized UUID that can be used across the web.

boazsegev commented 8 years ago

I thought I'd benchmark the effects of using extend vs. the effects of using on_open(var), since the busied to the airport was longer then I expected.

The results are kinda funny...

...With simple classes (small amount of functions), extend is more expensive (adds ~90ms to 1Million cycles, on my MacBook Pro).

With large classes, extend seems cheeper then the on_open function call.

What we gain from paying when using extend is ease of use for the developers, mostly where inheritance is concerned.

The worst case performance hit seems to be small, while best case scenario indicate improved performance (also small). performance variance in both situations seems negligible to me, but I might have missed something.

Please check the benchmarks and let me know what you think.

module ExtendedModule
  def self.extended base
    base.include InstanceMethods    
  end
  def a_func; puts "peek-a-bu" ; end
  def b_func; puts "peek-a-bu" ; end
  def c_func; puts "peek-a-bu" ; end
  def d_func; puts "peek-a-bu" ; end
  def e_func; puts "done." ; end
  module InstanceMethods
    def a_func; puts "peek-a-bu" ; end
    def b_func; puts "peek-a-bu" ; end
    def c_func; puts "peek-a-bu" ; end
    def d_func; puts "peek-a-bu" ; end
    def e_func; puts "finished." ; end
  end
end

class SimpleClass
  attr_accessor :var
  def a_func ; puts "hi" ; end
  def on_open var; @var = var; end
end

class ComplexClass < String
  def a_func ; puts "hi" ; end
  def on_open var; @var = var; end
end

CYCLES = 1_000_000
Benchmark.bm do |bm|
  bm.report("simple baseline") {CYCLES.times { obj = SimpleClass.new } }
  bm.report("simple with extend") {CYCLES.times { obj = SimpleClass.new; obj.class.extend ExtendedModule } }
  bm.report("simple with on_open") {CYCLES.times { obj = SimpleClass.new; obj.on_open ExtendedModule } }
  bm.report("complex with extend") {CYCLES.times { obj = ComplexClass.new; obj.class.extend ExtendedModule } }
  bm.report("complex with on_open") {CYCLES.times { obj = ComplexClass.new; obj.on_open ExtendedModule } }
end

SimpleClass.new.e_func

Results on my Macbook Pro (mid 2014, 2.5 GHz Intel Core i7):

       user     system      total        real
simple baseline  0.130000   0.010000   0.140000 (  0.125287)
simple with extend  0.380000   0.000000   0.380000 (  0.383625)
simple with on_open  0.170000   0.000000   0.170000 (  0.171970)
complex with extend  0.410000   0.000000   0.410000 (  0.415337)
complex with on_open  0.620000   0.000000   0.620000 (  0.616785)
finished.

evanphx commented 8 years ago

@boazsegev Good to see about the performance. I guess it feels like just a sort of weird API for it methods like that to be added dynamically. Your point about passing an object only to on_open is totally valid though. Perhaps the solution would be to pass that object to ALL on_* methods then.

This new object would also be the one that is advertise when querying all connected web sockets, which creates a clear API for what something from the outside can do to a websocket (send a message and close it mostly).

macournoyer commented 8 years ago

Would uuid, defer, each and count be part of the spec? Feels a bit to strict to me. Should be left as internal working of the servers.

But perhaps I'm not seeing the use case here...

boazsegev commented 8 years ago

@macournoyer , welcome to the discussion :-)

I landed in the UK after a long flight and a few trains and I'm a bit tired, so please excuse any incoherence on my part.

I should probably point out that most of the specification started with an actual use case - a web version for the iPhone app "Secret Chess", implemented using node.js.

This use case will probably be the best way for me to explain myself and demonstrate some of the reasons for the choices I made.

I know there are still things we can refine, and I'm thankful for any input and thoughts provoked.

When starting a Secret Chess game, a connected player broadcasts it's request to all available players. When simplified, it might look like this:

PlayersController.each do |player| 
   next unless player.available?
   next unless player.wants?(game)
   player.try_start(game) # fails if another player connected first
   end
end

Once the game started, moved are directed at the specific opponent using defer:

defer(opponent_uuid) { | opponent | opponent.signal_move(gmae_id) }

Why `defer`?

defer is actually a performance helper that is meant to make each iterations unnecessary when doing something like this:

PlayersController.each do |player|
  next unless player.uuid != opponent.uuid
  # ...
end

This is especially true since in a concurrent situation it is unlikely that each could actually be stopped using break.

Why `uuid`?

In essence, uuid could have been replaced with object_id and be done with it... except that object_id is the memory location of the object and memory is recycles, meaning there is risk of uuid collisions.

To allow defer to be safe (only invoke the block of code if the connection was found), a collision resistant alternative was required.

uuid is only required to be unique within the same process, which allows uuid to be a simple counter (each new connection increases the uuid by 1). This allows for both a simple implementation and a solves the issue of defer being exposed to connection collisions.

Why `each`?

The application iterates over each websocket connection, so we need to support each.

Some solutions in node.js involve keeping track of connections using an Array.

Experience shows that if the server / framework won't support an each method, then users will implement one of their own.... and they are likely to do a lesser job, since they do not have the data stores of the Server to draw from.

Why `count`?

Frankly, I'm not sure why sites have the "158 people connected" message, but we had one at some point and it seems that count is a common feature.

It is misleading, as there is no way to easily count connections when scaling the server across multiple processes or machines, but it is simple enough to implement if implementing each...

...I'm not averse to removing it, but this just means applications will be writing i = 0; each { i +=1 }. I believe we can optimize these away and better handle them in the server.

Also, the specification only requires that count return the number of connection for this connection (I edited the specification to make it clearer).

Why no `ws` object (or, Why use `extend`)?

This is my second time trying to implement something that woks. The first time I had a ws object and the price was worst then using extend, at least as far as complexity was involved (which eventually meant also performance was being degraded).

As you can see from the use-case, sending messages isn't the only use of the connection object.

One of the great advantages of a persistent connection is stateful data.

Using an object's persistent state to manage messages to the client might seem small, but it's an important part of what the protocol has to offer.

Separating the data from the connection seems somewhat artificial and inconvenient to me.

I'm falling asleep as I write. Have an amazing day. Good Night.

evanphx commented 8 years ago

@boazsegev One reason to leave out things like defer, each, and count from the spec is that in the world of multi-process deployments in ruby (which is pretty much all the production ones) these methods will be wrong. Clients will be connected to one of my processes that make up the cluster and I don't think the spec should mandate that they're all available via defer, etc. because that would mean a brand new protocol to keep the client list correct and shuffle messages for them across those boundaries.

For that reason, functionality like defer should be done as something added on to the rack support, rather than something rack provides. A great example of this is rails 5's use of redis to manage channels. For your use case, when you just want all the ones in one process, you can easily build that functionality on the on_open and on_close callbacks.

On the topic of a connection object, why was it worse? The API confusion about injecting methods seems more confusing to me honestly.

boazsegev commented 8 years ago

Midway through writing an answer, I realized that the each-defer methods you were thinking about weren't the same ones described in the specification.

Maybe my naming system is all wrong...

The scope for each of the websocket inter-communication functions (uuid, count, each and defer) is the scope of the process.

When using worker clusters, the scope will be limited to the worker process.

No inter-process communication is assumed nor required.

I'll fix the specification to make that clear.

I'll continue writing my answer and post it soon, but I think that this clarification elevates many (or most) of your concerns.

boazsegev commented 8 years ago

functionality like defer should be done as something added on to the rack support, rather than something rack provides.

This is a question of cost.

Having the server implement this functionality is far more effective and reduces both code duplication and data duplication / synchronization.

If client side implementations need to implement their own each and defer methods or link the ws object to the callback objcet (the write / close is in a different object), we will have evey websocket implementation writing a variation of the following code:

GlobalWebsocketArray = [].dup
class MyWebsocket
  attr_reader :ws
  def on_open ws
    @ws = ws
    GlobalWebsocketArray << self
  end
  def on_close
    GlobalWebsocketArray.delete self
  end
end

If we add thread safety concerns (which we must), this code will look slightly more complex:

GlobalWebsocketArray = [].dup
GlobalWebsocketLock = Mutex.new
class MyWebsocket
  attr_reader :ws
  def on_open(ws)
    @ws = ws
    GlobalWebsocketLock.synchronize { GlobalWebsocketArray << self }
  end

  def on_close
    GlobalWebsocketLock.synchronize { GlobalWebsocketArray.delete self }
  end
end

Protecting the connection global array using a Mutex will prevent data corruption and multi-threading issues but the price is very high.

GlobalWebsocketArray.each (within a global Mutex) is now the equivalent of the each function I proposed and the equivalent of defer would be:

GlobalWebsocketLock.synchronize do
  GlobalWebsocketArray.each do |client|
    next unless client.uuid == uuid
    # do something
    break
  end
end

This small code snippet isn't effective and suffers from a few issue. Just to name the obvious issues:

The server already maintains a properly synchronized array of websocket connections for the specific process. There is no need to duplicate the array.
The server can (probably) run multiple each function calls concurrently while still accepting new connections. However, a client side implementation will have to hang on on_open and on_close while each is being processed (because of the need for thread-safety and data synchronization).
There's a gap between the server closing the connection (removing it from it's own records) and the moment the on_close callback is actually executed. This time laps is worsened due to the mutex being used to sync the client side array of connections. This means that there is a lot more room for false positives (each invoking code on closed connections).
A server side implementation could prevent any race conditions by making sure no two websocket events of blocks of code run concurrently for the same connection. A client side implementation has a higher chance of race conditions, since both the client and the server access the handler to invoke actions.

These are just some of the issues that a client side (vs. server side) implementation face.

At worst case scenario, server side implementations can simply use the the code above as a baseline implementation and be equal to client side implementations.

At best case scenarios, server side implementations out-perform the baseline implementation by allowing multiple concurrent each calls to run while concurrently accepting new connections and closing existing ones (probably using a per connection lock and a non-dynamic array).

Rails 5 have a variation on this code, which isn't thread-safe. Some parts of it require a global server lock. This code is present within the Rails 5 "server" implementation.

It iss possible to minimize the lock's effects by sacrificing iteration performance and sacrificing some code safety... the following alternative might mitigate the problem, but it won't solve it:

def each
  i = 0
  while i < (GlobalWebsocketLock.synchronize {GlobalWebsocketArray.length})
    handler = GlobalWebsocketLock.synchronize {GlobalWebsocketArray[i]}
    break if handler.nil?
    yield(handler)
  end
end

Also, such an alternative introduces a higher risk for race conditions.

One reason to leave out things like defer, each, and count from the spec is that in the world of multi-process deployments in ruby (which is pretty much all the production ones) these methods will be wrong.

The method names might indeed be misleading or wrong, because the method's scope (unlike it's name) is limited to a single process within a cluster, but the functionality isn't wrong... in fact, it allows us to fix our current implementations which suffer from a resource heavy design.

Let's assume, for example, an application using Redis for pub/sub and inter-websocket communication.

The application spans across clusters of processes in multiple deployment centers across the glob (which is bad, since Redis doesn't support TLS/SSL, but it's a simple enough example and we'll ignore this at the moment).

There are two basic ways to implement this:

Each websocket connection could connect to Redis independently, listening to subscribed events. This will require an extra thread with an extra Redis connection per websocket connection (quite expensive).
Each process listens to subscribed events and forwards any relevant events to the connections owned by the process (requiring each, defer and uuid to be implemented). An example for this approach can be found here.

Current client side implementations of each are probably as expensive as having each websocket connection spawn a thread and open a new Redis connections, since each requires an almost global lock (I'm ignoring the price on the Redis machine).

However, by using this specification to provide a cheaper each that can avoid the use of a global lock, we can free up Redis connections, so that each process requires only a single pub/sub connection.

This does more then save up on (expensive) resources.

Since TCP/IP has a slow start feature, data transfer will be faster when using a single persistent connection.

Also, now that there is no need to disconnect and reconnect from Redis, the network load is reduced.

Last point is that Redis connections will be less susceptible to network timeouts, since there is more likelihood for ongoing activity.

To sum it up, client-side implementations of each-defer methods are ineffective and their performance cost is much higher then server side implementations.

The same is true for multi-process / cluster implementations, where external resources don't have to make up for a weak local implementation.

On the topic of a connection object, why was it worse? The API confusion about injecting methods seems more confusing to me honestly.

If we aren't implementing the each-defer functionality, it really doesn't matter.

But if we do implement it, then having a separate ws object increases the cost of performance and the complexity of both the application and the server (depending on your design).

When having separate objects, we need to link the ws object to the callback object, so that each yields the callback objects. This is in addition to the client having to link the callback object to the ws object, so that they can initiate websocket push for data (instead of only react to events).

We could simply have each return the ws object, but then the application will need to access it's data using the ws.handler object, so the link is still required.

This double linkage, which promotes more code jumps and method calls, is no more then a poor-man's mixin which is easy enough to provide using extend.

Using extend, our each implementation will look more like GlobalWSArray.each {|ws| yield(ws) } then GlobalWSArray.each {|ws| yield(ws.handler) }, which is one less function call and code indirection.

This is true also for write and close operations, that will loos as handler.write instead of handler.ws.write.

This means that using extend minimizes virtual function table lookups and code jumps, potentially increasing performance.

Again, this is mostly a question of price.

As for:

The API confusion about injecting methods seems more confusing to me honestly.

I'm guessing this is a personal preference, but it's also a question of philosophy.

Is auto-correction for developer code better then error reporting (ignoring price considerations)?

In theory, we could force clients to mixin the Websocket functionality themselves, something to the effect of:

class MyWebsocket
   extend Rack::Websocket
   def on_open
     write "welcome"
     # ...
    end
    #...
end

This would probably be the clearest presentation of code and I'm sue that now the write method feels more natural to you.

On the other hand, this does make the upgrade process more error prone. A missing extend might crash the server (depending on the design) or produce unexpected results, which means we need to do error checking...

...after experimenting with error checking, I realized that using extend is better.

extend performs the same error check, but instead of crashing or reporting an error, extend corrects the error by extending the Websocket Callback Object with the missing mixin.

I think I prefer auto-correction over error reporting and I'm aware that this preference isn't true for everyone. However, I think more Rubyists agree with this philosophy.

macournoyer commented 8 years ago

I'm with @evanphx on this. I think the API should be as minimal as possible.

The Rack spec always had a focus on simplicity instead of performance. It's the job of server/framework implementers to make it fast.

So I'd drop everything but the on_ callbacks, write and close.

evanphx commented 8 years ago

@boazsegev So to try and get a handle on what would be needed by server I went ahead and implemented the bits of the Spec and then starting adding support for it in rails, to see how the API felt. There are a few thoughts having implemented it now:

The idea that a each and defer is trivial because a server already has that list is wrong. The puma implementation doesn't keep a list of all open websockets because it's unnecessary. To add it I'd end up adding the exact code you pasted in above, in which case it's best to have it done in an add-on gem rather than specified by the Spec.
The insertion point env["upgrade.websocket"] likely needs to be a proc that is called like hijack and it would return the connection object that has the write and close methods. This is so that code can take that connection object and store it in the apps registry to be used. I think you're using of each is what made this possible, you'd end up going back through and finding the object to write to for a client. Returning the object from the insertion point is much cleaner and it mimics the hijack API, which is good.

evanphx commented 8 years ago

I finished prototyping rails integration and have opened a PR for the Rails devs to comment on: https://github.com/rails/rails/pull/26237

boazsegev commented 8 years ago

@evanphx , a few questions about no. 2:

Is the upgrade performed when the method call is called? i.e. can data be sent immediately after calling env["upgrade.websocket"].call or are we required to wait for the on_open callback?
How are special headers or cookies set for the response? i.e., what about session middleware - will it be possible to update session data?

I ask these because the idea of storing a callback object (vs. using a proc) was mainly to allow a normal header and middleware flow, allowing for cookie data and session data to be set during the upgrade. I didn't think a proc could do it and I'm curious about what you tried.

boazsegev commented 8 years ago

@evanphx , a question about no. 1:

If Puma doesn't keep a list of open IO objects and their respective protocol, how does it route the IO websocket events to the websocket parser and callback object?

I'm not sure I understand...

As a side note, I think the Specs should provide standard functionality. If one server doesn't support part of the specs, no client can rely on the functionality as being supported and it's as good as removing it.

I'm wondering if we could move a trivial baseline implementation into the Rack gem and allow that implementation to be overridden by servers that wish to optimize on that baseline code...?

evanphx commented 8 years ago

@boazsegev The connection object returned from .call isn't usable until after the request returns to the server. So there is a time where you have the object but can't use it yet. That limitation could be solved by queuing the messages written to it and flushing them when the return is done.

Or performing the .call could do the upgrade right there, like hijack disowns the socket right at the point of the .call. In that case, you'd have to pass additional headers to .call. I actually extended .call to take an options hash so that things like the websocket protocol can be accepted to, so it could look like env["upgrade.websocket"].call(handler, protocol: protocols, headers: { "X-Whatever" => "foo"})

As for the the list, Puma has an IO reactor it uses to track all connections that are waiting for data, which includes slow HTTP clients, and that reactor holds the websocket info as well. So the list exists but it's not a list of websockets, it's a list of all IO objects are being watched for data.

Lastly, I agree that the Spec should provide standard functionality, and that's a reason to leave out things like each, so that web servers have the smallest surface area to implement and thus it's more likely that support will become widespread.

evanphx commented 8 years ago

To clarify, the way I have it in puma right now is that the headers returned from the request are added to the websocket upgrade reply, so they do make it back to the client, at the expensive of the connection object not working until the request returns.

boazsegev commented 8 years ago

@macournoyer:

The Rack spec always had a focus on simplicity instead of performance.

I think API simplicity refers to what the user of the API experiences and not how complicated the underlying task is... and servers are not simple beasts.

I think an API consisting of write, close, each, defer and count is very simple - even if the API's user will have to remember that the scope of any of the functions is never more then the scope of the process (or worker process), which is quite trivial for anyone that ever used fork.

Besides, will we sacrifice the future on the alter of the past?

@evanphx:

Thank you for inviting the Rails team to the discussion, I think that listening to the people that will actually use the API is paramount.
IO list: my server's IO list is also a global list (not a per protocol list). My implementation hinges on libserver's each which uses an if statement , so I guess the question is whether the client implementing a global websocket server lock is more (or less) performant the us implementing a per connection lock and an if statement.
Implementing call has advantages due to extendability in Ruby, but that seems thin, since any extension could probably be implemented using the response header data as well.

I should point out that call also has the disadvantage or requiring another object per connection (the Proc object) requiring another function call and requiring the Proc's implementation (what would I put there?)...

... it seems to me that using call makes for a needless performance sacrifice, just to look like the more complicated hijack API.

On a personal note, my server implementation is in C and creating new Proc objects isn't as easy for me.

Having said that, if everyone prefers the API uses a Proc, I will implement a Proc, it's doable, it's just more expensive.

P.S. / Meandering Thoughts

I've been thinking about this discussion a lot.

I am very happy that things are moving forward and that Websocket support is trickling down into the server layers.

I guess whatever happens is an improvement on the current situation... I think.

However, I also feel that the discussion is "what we as servers implementors want to supply?" rather then "what do our clients [websocket applications] need from us?"

I also feel that our attachment to the past, both in philosophy and design, is a mistake.

Rack's existing philosophy and design is what brings many developers to choose Node.js. The lack of streaming support, SSE and the CGI model were great for 2007 but they are failing for 2017. The preference of implementation ease over performance is moving people away from Ruby and it isn't something we should accept from our web-server (even though I would accept it form my own application for non-critical tasks or helpers).

Here are a few things I though about:

The big issue underlying the each-defer debate is that of control flow.

Does the server have exclusive control over when a websocket object's methods are executed or is that control shared with the client, promoting race conditions and introducing more complexity into the application?

Should the server control each to make sure the same websocket object doesn't run more then a single task at a time (allowing other tasks to run while waiting) or should the client implement a lock within each of the websocket object's methods, so that the threads block and wait to access critical sections?
Moving the each method to the client forces the client to choose between a global websocket server lock and implementing a pub/sub subscription (read: 1 Redis connection + 1 thread) per websocket connection. This is in addition to a per connection lock used to protect against race conditions.

A global websocket server lock is terrible for concurrency, especially as Ruby's GVL is due to be removed.

Implementing a pub/sub subscription per websocket connection is also super expensive.

Just to clarify... In addition the performance cost of disconnecting and reconnecting to the pub/sub service and hanging on critical sections, this could mean the difference between supporting a 100,000 concurrent clients for free or a few dollars a month vs. hundreds of dollars a month (and probably more) for the same (or less) amount of clients.
At the end of the day, I ask myself - would I use a global lock? (I don't want to); Would I use a pub/sub connection per websocket? (not before I have a large paying client base, and even then I'd rather not); Would I risk data corruption and ignore multi-threading issues? (security comes first) ....

....

This means that it is is more likely that I will simply switch to Node.js then choose any of these solutions for my application. Having a single threaded evented design will solve all these issues that are left unresolved by a half-way implementation.

This, I think, means that each and defer aren't really optional.

evanphx commented 8 years ago

I think that the functionality of each, defer and count should be provided by something outside the server implementation, perhaps in a rack-websocket gem. @boazsegev if you feel it's so important that you have your own implementations, I think thats fine. I actually think that having them be added on it better, because it allows for the user to have better control of their effect.

For example, a rack-websocket gem could provide a Rack::ProcessEach that does what you provide but also could provide a Rack::RedisEach that uses redis to exchange messages. Being able to compose that functionality is good, imho.

Your mention about needing a separate pubsub connection per websocket if there is not an each is wrong, since it's easy enough for someone to implement the process each.

Even in the case of iodine, the #each functionality is not very useful because of the presence of additional worker processes.

@boazsegev We're happy to engage in this process with you but it was you who asked us to engage. If you reject our opinions at every turn this is not a collaborative process. I really do want to come up with a spec for this because I'll likely shepherd it into rails and so puma will support the functionality one way or another.

The discussions about a server lock to being terrible is overblown imho. The list of open websockets must be subject to a list because it can be modified concurrently so there is no getting around protecting the list. The code can easily be written to dup the list so that the block is run without the lock held if necessary as well. Also, the Ruby GVL is not scheduled to be remove and likely won't be anytime soon.

Finally, I really do want to come to some consensus but given the opinions of @macournoyer and I, I hope you can see our perspectives on this.

evanphx commented 8 years ago

On the topic of extend vs connection passing I'd be fine going with an extend approach. It gives the server the ability to inject something very specialized to itself which makes it quite performant.

Any thoughts on the semantics of .call? Should it start the websocket right away or wait until the request returns?

boazsegev commented 8 years ago

@evanphx ,

I am sorry if I made you feel less then welcome. I come from a line of hot blooded people and often we express ourselves very intensely.

I have updated the specification to represent the current votes.

Even if I am very much against an idea, this is a community effort and I believe that the voice of the majority should decide. After all, this isn't my project, it's our project.

evanphx commented 8 years ago

@boazsegev No problem, thanks for getting this party started!

boazsegev commented 8 years ago

As to the nature of call:

I think that whatever technique we decide on (call or setting a value), we should allow the response to run it's course through the middleware chain before upgrading the connection.

This will allow middleware to set cookies and perform tasks as expected and minimize the intrusion of the upgrade process on existing middleware implementation.

If we're going with call, I would recommend that the actual upgrade is delayed until the response returns.

evanphx commented 8 years ago

@boazsegev Ok, that is my thinking too. We do need to figure out the behavior of a returned connection object before the request returns. Should it raise an exception that the connection hasn't been opened yet? Should it queue any messages and flush them once it begins?

boazsegev commented 8 years ago

If we use extend together with call, we can probably return true / false, to prevent the need for queuing up messages and managing pre-upgrade websocket function calls.

If anyone will want to implement each, they will have to use the on_open callback, but I think that makes more sense then someone using the websocket object before the upgrade.

boazsegev commented 8 years ago

Actually, the same is true even if we don't use extend.

boazsegev commented 8 years ago

P.S.

What are you thinking is the upside of using env[..].call versus env[..] =?

evanphx commented 8 years ago

@boazsegev So you'd advocate for call NOT returning a connection object and folks waiting for on_open to grab the object to use? The reason I ended up thinking returning a connection object would be good was because it made adding support to rails so easy.

The reason for call is easier understood by looking at my rails PR: https://github.com/rails/rails/pull/26237/files

It gives code the ability to return a connection object. Now if we do not return a connection object at all, then we don't need call, we can just do env[..] = to setup the handler.

boazsegev commented 8 years ago

I love Rails, but I'm thinking about pure Rack users as well. When they get the Websocket object, they might expect it to be usable. Adding error handling to the write and close is unnecessary if we return true or if we simply use env[...] =.

The PR looks very clean, but I think most of it can be done using env[..] =

i.e.

if env["upgrade.websocket?"]
    @driver_started = true
    @nws = env["upgrade.websocket"] = self
    return
else
    @nws = nil
end

However, this assumes that the protocols string goes into the response header response['Sec-WebSocket-Protocol'] = protocols (protocols will need to match the required string)...

... I'm not sure where that goes in the Rails PR. Will that be an issue?

EDIT

I realized I'm not sure what protocols refers to.

I also realized that the @nws object isn't strictly necessary.

Also, there might be a name collision with write, since Rails implements it for non native Websockets as @stream.write in Rails.

The write collision could be resolved by calling super if the @nws is set or defined?(super) is true.

evanphx commented 8 years ago

@boazsegev I think it might be an issue, yeah. We probably need a way to expose that ability, which is something .call makes easy as well.

boazsegev commented 8 years ago

@evanphx, could you explain what you mean by:

We probably need a way to expose that ability...

What ability? isn't the upgrade.websocket? enough?

evanphx commented 8 years ago

I mean we need to make it possible to set the web socket protocols header On Sun, Aug 21, 2016 at 2:10 PM Bo notifications@github.com wrote:

@evanphx https://github.com/evanphx, could you explain what you mean by:

We probably need a way to expose that ability...

What ability? isn't the upgrade.websocket? enough?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/boazsegev/iodine/issues/6#issuecomment-241282328, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAAB6RbKB4ZeSc--L9z8OFPHZ5aOrR0ks5qiL7OgaJpZM4JdgFS .

boazsegev commented 8 years ago

Oh. Sorry, I should have understood that one earlier.

In this case, call will have to support the protocols option as part of the specification.

This means that there are headers that aren't set as part of the response... I'm a bit averse to this "split" in the way headers are set, but this is an emotional response that isn't really relevant to the discussion.

This also means that the server will have to keep this extra state (the protocols string) and remember to add it while writing the response. This is more of an issue, as it adds complexity to the implementation.

It feel to me that using call and all this extra code and complexity is just so developers won't have to write the header themselves.

If it were a core feature for Websockets, I would probably be more relaxed about it, but at the end of the day, if you're experienced enough to use the protocol feature correctly, you're probably experienced enough to set the header in the response.

Another option will be to add an optional Hash key upgrade.websocket.protocol. This will solve the question of "where to store the information?", elevate some of the added complexity and make implementing call unnecessary.

@evanphx , am I convincing you or do you think implementing call is important?

@macournoyer , you're our tie breaker, I'd love to hear your voice on this one and know where you stand. I apologize if I were rude before. If I were any good with people, I might not have ended up so good with machines... How would you implement this?

evanphx commented 8 years ago

@boazsegev So the big reason to have protocols there rather than passing it back as a normal rack header is that it's only valid for RFC6455. Now it probably doesn't hurt if it's sent back in an older websocket impl is used, but I also sort of doubt that anyone is using a browser that does NOT do RFC6455 now anyway.

In which case, we could probably get away with ignore the protocols stuff all together and just pass the header back in the rack response.

Ok, so with protocols out of the way, the question is if .call is important to exist so that it can return a different object that the websocket handler. Your suggest is totally valid, to just save the handler and use it later, knowing that it will be extended with the write methods once the response is sent.

One potential weird thing about that is that the handler won't actually have a write method until the websocket connection happens, so if someone tries to use it they'll get a NoMethodError. That does solve one of the problems, namely how to handle the connection object being used before the upgrade has happened.

So actually, now that I talk through it that might not be that big of a deal. Basically the handler will get a write method at some point and when it does, the code knows that it's ready to be used. That probably works fine, but we'll just need to document it well because people will make the mistake of tried to write to it after making the assignment.

One reason NOT to do the .call and just assign directly into env is that users won't expect the connection to be upgraded at that point. They'll just see themselves as saying "when the connection IS upgraded, use this handler object". Where as if call is used, they might incorrectly assume that the upgrade is happened right then and not later.

Ok, let me revise what I've said now: We don't need .call and it's probably too confusing to have it anyway. We can instead talk about upgrade.websocket similar to the way that response hijacking works (not exactly the same, but similar): http://www.rubydoc.info/github/rack/rack/master/file/SPEC#Response__after_headers_

evanphx commented 8 years ago

I updated the puma and rails PRs to a more simplified style:

Just assigned to env["upgrade.websocket"], no .call
No extending of the handler
Passing a connection object with write and close to the on_* callbacks

This feels like the cleanest version yet. Please check them out and give me your feedback.

boazsegev commented 8 years ago

@evanphx ,

I'm wondering about the "No extending of the handler", since it means I will have to break backwards compatibility with the existing Iodine server and release Iodine 0.3.0... (this also means that people using Iodine 0.2.0 with Rails 5 will experience breaking issues)...

Is it important that we avoid extend?

So far, using extend allowed me to optimize the Iodine server by never creating the ws object.

I was simply saving the connection's id (fd) as a secret variable within the handler object (C allows me to save information in an object without it being available to the Ruby object).

Now, I will need to create another object per connection and either connect these two objects or manage another object in the garbage collection marker array (since I use C, I need to take care of the garbage collection's mark and sweep routine).

I know you're writing your server in Ruby, but mine is mostly written in C.

What seems like a simple bit of code in Ruby can be a lot more complex in C (or Java, when we get Java based servers to implement this).

macournoyer commented 8 years ago

@boazsegev I can't be a tie breaker, I haven't read all the comments.

What I care about is that the "websocket callback object" is kept simple, ie. only on_ callbacks, write and close. Other stuff I have no strong opinions.

evanphx commented 8 years ago

@boazsegev For backwards compat, you could easily detect the arity of the on_* methods and if it's 0 for on_open, 1 for on_message, you do the extend since you know that this handler is expecting the extend. That seems fine.

The issue with extend is that for us not working in C for this, it's quite easy to collide with instance variables and such, we don't have the ability to hide the data.

evanphx commented 8 years ago

@boazsegev Also, you could have behavior where you ALWAYS extend the handler and then use the handler itself as the connection object if the handler is expecting one. That behavior is in spec and likely not a big deal.

boazsegev commented 8 years ago

@evanphx , I do understand your concern.

The second option (which is more performant) will still allow for collisions in the write implementation.

If the user does something like:

def write(data)
    ws.write(data)
end

My implementation will break causing a stack overflow.

However, if the client is aware of the extend issue, they could easily do something like:

def write(data)
    return super if defined? super
    non_native_ws.write(data)
end

Which also solves any collisions you might fear of experiencing.

If you're saving internal data as @__puma_var, I doubt collisions will exist.

Another option is to have a ws object in the extended module. Having a @__puma_ws object and implementing write as:

def write(data)
    @__puma_ws.write(data)
end

Will do two things that are quite nice for the client.

This will prevent the need for storing the ws object.
This will make the callbacks cleaner, since they will not require any arguments.

I do prefer a more performant solution, since extend allows us more control and better performance.

For example, calling callbacks using ws will require something like this (which iterates over the virtual function list hierarchy twice):

ws.handler.on_close(ws) if ws.handler.respond_to?(:on_close)

Using extend, we can implement a default on_close callback for the parent mixin, saving us from performing a respond_to? check and increasing performance.

ws.on_close

The difference is a single function call vs. four... for every callback.

evanphx commented 8 years ago

So you feel that extend is more performance because it passes one few argument to the on_* methods?

boazsegev commented 8 years ago

I think a Benchmark might convince you about the usefulness of the extend.

class MyClass
  def a_func
    nil
  end
end

item = MyClass.new
REPEAT = 1_000_000
Benchmark.bm do |bm|
  bm.report("call existing func") {REPEAT.times {|i| item.a_func } }
  bm.report("call & check existing") {REPEAT.times {|i| item.a_func if item.respond_to?(:a_func) } }
  bm.report("call & check missing") {REPEAT.times {|i| item.b_func if item.respond_to?(:b_func) } }
end

The results:

       user     system      total        real
call existing func  0.060000   0.000000   0.060000 (  0.062685)
call & check existing  0.110000   0.000000   0.110000 (  0.120615)
call & check missing  0.110000   0.000000   0.110000 (  0.102416)

extend allows us to perform X2 faster when we implement default (empty) callbacks in the parent mixin.

evanphx commented 8 years ago

That's totally unfair because you have the respond_to? check in there. That would not be in the hot path, can you remove it and run the benchmark again?

evanphx commented 8 years ago

Oh, it's the same thing. Of course the others are slower because of the respond_to? check. You would not be doing those in the hot path because you can detect if it respond_to? when it's created and set a flag or a use a separate code path entirely when data is ready.

boazsegev commented 8 years ago

This is my current mixing for Iodine.

As you can see, it includes default implementation of the callback functions, allowing me to skip the respond_to? check.

This is exactly what the extend is good for. It isn't (just) the extra argument.

evanphx commented 8 years ago

You absolutely do would not be doing a respond_to? check on every dispatch, there are many easy way to make it perform exactly the same as the extend case.

boazsegev commented 8 years ago

This also means I don't need to compute a hot path for every handshake.

evanphx commented 8 years ago

Computing the hot path is one respond_to check during the handshake, since you'd be doing that instead of the extend it's the same cost.

boazsegev commented 8 years ago

@evanphx , I'm not saying that your way won't work. There are always more optimizations that we can make and we are probably both good enough to manage both situations (although I suspect you're better at this then I).

What I am saying is that extend gives use more flexibility, it's easier to implement and it allows us more control over possible future enhancements and optimizations.

Also, extend is performed once per application run (although it's checked every handshake).

I know it's a little weird, but Ruby meta-programming is very powerful and I think we should use it.

If you're thinking about the Rails implementation, I'm sure the return super if flag solution is easy enough to implement and add to the existing implementation.

evanphx commented 8 years ago

I just disagree that extend gives more flexibility, I feel that it actually restricts flexibility with the handler due to method collisions.

boazsegev commented 8 years ago

And it isn't the same cost, since hierarchy checks are faster and we only check for one hierarchy vs. four callbacks.

boazsegev / iodine