faye / websocket-driver-ruby

WebSocket protocol handler with pluggable I/O
Other
223 stars 43 forks source link

Use symbols rather than strings for event names. #31

Closed ioquatix closed 9 years ago

ioquatix commented 9 years ago

I think it's more typical to use symbols for immutable event names, e.g. on(:close) { ... }. Currently it is on('close') which is more common in JavaScript since JavaScript doesn't have symbols.

jcoglan commented 9 years ago

I've updated the docs in https://github.com/faye/websocket-driver-ruby/commit/d461b0def5ad0fb8fc662dfbdc33db97c547f2f5.

I tend to view symbols as things you should only use if you can enumerate all the possible values a symbol-typed variable can take, and those values have distinct meanings to the program. In the sense that it can be used to register arbitrary event names, EventEmitter fails this test.

However, the drivers do emit specific events with specific meanings that are hard-coded into the program, as symbols; they're not arbitrary data. So in this sense, using symbols is the right thing do to.

(This argument means I actually think most uses of symbols are a bad idea, and a struct should have been used instead. Symbols are basically implicit bits of object APIs encoded as data. I think viewing them as 'immutable strings' is an accident of their appearance; they are really persistent atomic distinct values that can referred to by name.)

Since this module has always accepted arbitrary strings, and is being used downstream in other gems to handle arbitrary events, it must continue to accept strings, and so will continue to turn inputs into strings internally. If you use symbols, you're protected against the names being mutated anyway because Symbol#to_s returns a new string every time.

Converting inputs to symbols without whitelisting the allowable event names would cause memory leaks and DoS vectors on all but the latest Rubies, and per my comments above I think using to_sym is an indication you might be misusing symbols.

ioquatix commented 9 years ago

Your arguments are mostly subjective, but the objective one is that using String is simply slower than using Symbol. You can use Symbol internally which IMHO is the right thing to do.

ioquatix commented 9 years ago

Converting inputs to symbols without whitelisting the allowable event names would cause memory leaks and DoS vectors on all but the latest Rubies, and per my comments above I think using to_sym is an indication you might be misusing symbols.

Can you explain how this would happen?

ioquatix commented 9 years ago

Here is a simple benchmark:

#!/usr/bin/env ruby2.0.0

puts RUBY_VERSION

require "benchmark"

precomputed_string = 'message'
precomputed_symbol = precomputed_string.to_sym

MAP = {
  precomputed_string => true,
  precomputed_symbol => true
}

Benchmark.bm(20) do |x|
  x.report("precomputed_string") do
    10000000.times { MAP[precomputed_string] }
  end
  x.report("precomputed_symbol") do
    10000000.times { MAP[precomputed_symbol] }
  end
end

# 2.1.2
#                            user     system      total        real
# precomputed_string     1.030000   0.000000   1.030000 (  1.034497)
# precomputed_symbol     0.650000   0.010000   0.660000 (  0.646473)

Using symbols is basically 2x faster. Although a bit contrived, it demonstrates the kind of performance issues I was mentioning.

jcoglan commented 9 years ago

I'm not convinced by microbenchmarks without context so I ran some of my own using benchmark-ips, which also produces standard deviations. Let's start with something simple, assigning a value into a hash. The checks for symbols have to be done because the code currently allows any object that responds to to_s (i.e. any object at all) to be an event key. Not all objects respond to to_sym and need converting to strings first.

require 'bundler/setup'
require 'benchmark/ips'

string = 'hash_key'
symbol = :hash_key
hash   = {}

Benchmark.ips do |ips|
  ips.report 'set string as string' do
    hash[string.to_s] = true
  end

  ips.report 'set string as symbol' do
    key = string.respond_to?(:to_sym) ? string : string.to_s
    hash[key.to_sym] = true
  end

  ips.report 'set symbol as string' do
    hash[symbol.to_s] = true
  end

  ips.report 'set symbol as symbol' do
    key = symbol.respond_to?(:to_sym) ? symbol : symbol.to_s
    hash[key.to_sym] = true
  end
end

This produces (on one of my machines, I have found results vary substantially between boxes):

set string as string      6.107M (± 0.6%) i/s -     30.633M
set string as symbol      4.391M (± 0.8%) i/s -     22.034M
set symbol as string      4.533M (± 0.7%) i/s -     22.758M
set symbol as symbol      5.349M (± 0.6%) i/s -     26.760M

With string inputs (which is what the docs currently tell you to use), strings is faster. For symbol inputs, symbols is faster.

But this lacks context and doesn't represent what really happens. These are all reusing a hash whereas typically an event emitter will be created and have events added to it once. So we roughly have one assignment per hash creation.

require 'bundler/setup'
require 'benchmark/ips'

string = 'hash_key'
symbol = :hash_key
hash   = {}

Benchmark.ips do |ips|
  ips.report 'set string as string' do
    hash = Hash.new { |h,k| h[k] = [] }
    hash[string.to_s] << true
  end

  ips.report 'set string as symbol' do
    hash = Hash.new { |h,k| h[k] = [] }
    key = string.respond_to?(:to_sym) ? string : string.to_s
    hash[key.to_sym] << true
  end

  ips.report 'set symbol as string' do
    hash = Hash.new { |h,k| h[k] = [] }
    hash[symbol.to_s] << true
  end

  ips.report 'set symbol as symbol' do
    key = symbol.respond_to?(:to_sym) ? symbol : symbol.to_s
    hash = Hash.new { |h,k| h[k] = [] }
    hash[key.to_sym] << true
  end
end
set string as string    920.670k (± 0.8%) i/s -      4.622M
set string as symbol    898.880k (± 1.5%) i/s -      4.549M
set symbol as string    837.546k (± 1.1%) i/s -      4.226M
set symbol as symbol    927.281k (± 1.3%) i/s -      4.705M

So now the differences are much smaller. But this still doesn't represent usage. What tends to happen is a hash is created, an event listener is stored as a lambda, and then the emitter fires many events, each of which will involve converting the key, iterating over a list of lambdas and calling each one.

require 'bundler/setup'
require 'benchmark/ips'

string = 'hash_key'
symbol = :hash_key
hash   = {}

Benchmark.ips do |ips|
  ips.report 'set string as string' do
    hash = Hash.new { |h,k| h[k] = [] }
    hash[string.to_s] << -> { }
    100.times do
      hash[string.to_s].each { |l| l.call }
    end
  end

  ips.report 'set string as symbol' do
    hash = Hash.new { |h,k| h[k] = [] }
    key = string.respond_to?(:to_sym) ? string : string.to_s
    hash[key.to_sym] << -> { }
    100.times do
      key = string.respond_to?(:to_sym) ? string : string.to_s
      hash[key.to_sym].each { |l| l.call }
    end
  end

  ips.report 'set symbol as string' do
    hash = Hash.new { |h,k| h[k] = [] }
    hash[symbol.to_s] << -> { }
    100.times do
      hash[symbol.to_s].each { |l| l.call }
    end
  end

  ips.report 'set symbol as symbol' do
    key = symbol.respond_to?(:to_sym) ? symbol : symbol.to_s
    hash = Hash.new { |h,k| h[k] = [] }
    hash[key.to_sym] << -> { }
    100.times do
      key = symbol.respond_to?(:to_sym) ? symbol : symbol.to_s
      hash[key.to_sym].each { |l| l.call }
    end
  end
end
set string as string     42.554k (± 2.1%) i/s -    212.900k
set string as symbol     33.616k (± 1.7%) i/s -    169.422k
set symbol as string     33.525k (± 2.5%) i/s -    170.850k
set symbol as symbol     38.501k (± 2.3%) i/s -    194.726k

So now we're doing more lookups, the difference is more pronounced but still not double. And in the use case encouraged in the docs it's actually slower. I imagine if you mix this in with I/O and logic happening inside event listeners it will almost vanish.

Remember I said you can currently use any value as an event key? We need to preserve that semantic because EventEmitter is used in other gems. (The changes I did to make it accept lambdas instead of blocks have already broken faye-websocket and I'm currently building a release to fix that.) In general, it is not safe to convert arbitrary input to symbols because it will leak memory and can be exploited to cause DoS attacks. So, we'd need to white-list acceptable values before turning them into symbols, which would involve more logic that would hurt performance, more complexity, and would break downstream uses -- if we suddenly require events to be pre-registered, current users of this code will break. This is what I meant by symbols being only for things you can finitely enumerate -- trying to use them for arbitrary inputs creates complexity.

So overall I'm not convinced this is the right course of action.

ioquatix commented 9 years ago

Remember I said you can currently use any value as an event key? We need to preserve that semantic because EventEmitter is used in other gems.

Then, it would make sense to avoid converting to a string and simply use keys as is.

Then, do the lookup like so:

def lookup(key)
  hash[key] || hash[key.to_s]
end

In general, it is not safe to convert arbitrary input to symbols because it will leak memory and can be exploited to cause DoS attacks.

This may be true in general - many things are, but in this specific case I can't imagine how this would ever be a problem.

trying to use them for arbitrary inputs creates complexity.

We are talking about event names here, right?

ioquatix commented 9 years ago

I imagine if you mix this in with I/O and logic happening inside event listeners it will almost vanish.

I can't imagine anyone putting major logic in the event listeners as it doesn't work well with the current design except for very simple applications.

ioquatix commented 9 years ago

I've added the last two for example.

#!/usr/bin/env ruby

require 'bundler/setup'
require 'benchmark/ips'

string = 'message'
symbol = string.to_sym
hash   = {}

Benchmark.ips do |ips|
  ips.report 'set string as string' do
    hash = Hash.new { |h,k| h[k] = [] }
    hash[string.to_s] << -> { }
    100.times do
      hash[string.to_s].each { |l| l.call }
    end
  end

  ips.report 'set string as symbol' do
    hash = Hash.new { |h,k| h[k] = [] }
    key = string.respond_to?(:to_sym) ? string : string.to_s
    hash[key.to_sym] << -> { }
    100.times do
      key = string.respond_to?(:to_sym) ? string : string.to_s
      hash[key.to_sym].each { |l| l.call }
    end
  end

  ips.report 'set symbol as string' do
    hash = Hash.new { |h,k| h[k] = [] }
    hash[symbol.to_s] << -> { }
    100.times do
      hash[symbol.to_s].each { |l| l.call }
    end
  end

  ips.report 'set symbol as symbol' do
    key = symbol.respond_to?(:to_sym) ? symbol : symbol.to_s
    hash = Hash.new { |h,k| h[k] = [] }
    hash[key.to_sym] << -> { }
    100.times do
      key = symbol.respond_to?(:to_sym) ? symbol : symbol.to_s
      hash[key.to_sym].each { |l| l.call }
    end
  end

    ips.report 'set key as key (with to_s fallback)' do
        key = symbol
        hash = Hash.new { |h,k| h[k] = [] }
        hash[key] << -> { }
        100.times do
            (hash[key] || hash[key.to_s]).each { |l| l.call }
        end
    end

    ips.report 'set key as key' do
        key = symbol
        hash = Hash.new { |h,k| h[k] = [] }
        hash[key] << -> { }
        100.times do
            hash[key].each { |l| l.call }
        end
    end
end
Calculating -------------------------------------
set string as string      3658 i/100ms
set string as symbol      2880 i/100ms
set symbol as string      2992 i/100ms
set symbol as symbol      3383 i/100ms
set key as key (with to_s fallback)
                          4604 i/100ms
      set key as key      4782 i/100ms
-------------------------------------------------
set string as string    37571.6 (±3.8%) i/s -     190216 in   5.070063s
set string as symbol    28953.8 (±4.1%) i/s -     146880 in   5.081553s
set symbol as string    30421.8 (±3.9%) i/s -     152592 in   5.023551s
set symbol as symbol    34637.8 (±3.6%) i/s -     175916 in   5.085587s
set key as key (with to_s fallback)
                        47269.4 (±4.3%) i/s -     239408 in   5.074348s
      set key as key    48694.8 (±5.5%) i/s -     243882 in   5.026841s

For me, it looks significantly faster and is semantically the correct thing to do. If you want, I'll profile a real world application and we could see the numbers.

jcoglan commented 9 years ago

I can't imagine anyone putting major logic in the event listeners as it doesn't work well with the current design except for very simple applications.

Can you elaborate on what you mean by this?

We are talking about event names here, right?

Yes. The current semantic is that anything with a string representation can be used as an event name. This is a bit like how in RSpec describe and it notionally take strings but are also frequently given classes, because Class#to_s returns the class's name.

But let's be a bit more conservative and stick to the use cases that I actually know exist and that are in the documentation, and require only that EventEmitter accept strings and symbols as event names and treat them equally, that is, a listener bound to :message will be invoked by 'message' and vice versa.

I've pushed a spec for this constraint to the branch optimise-event-emitter: https://github.com/faye/websocket-driver-ruby/commit/d860041db5ded3bf600dc2a30cfaaa0276a42520.

If you have a changeset that meets this spec and produces a significant performance improvement on round-trip benchmarks (e.g. the Autobahn performance suite) then I'll consider merging it. It should do this without breaking this gem's tests, or those of faye-websocket, or the Autobahn test suite.

ioquatix commented 9 years ago

I can't imagine anyone putting major logic in the event listeners as it doesn't work well with the current design except for very simple applications. Can you elaborate on what you mean by this?

Yes, as discussed earlier, if you have any kind of state machine (e.g. authentication -> main loop), you can't really do this using the event callbacks, you need to queue the messages in the callback and deque them in the relevant part of the state machine.