Open thesecretmaster opened 5 years ago
Update: After some debugging, I've realized that this only occurs when I configure pinging, either with Faye::WebSocket.new(env, ping:5)
or with an EM periodic timer. It runs fine in thin, but in puma it causes latency up to the ping time. Maybe I misunderstand how pinging works in the WS protocol, but it shouldn't cause this sort of latency? Could this be a puma bug?
Update 2: This seems like a puma bug that won't get fixed around eventmachine+puma, at least based on this issue: https://github.com/sinatra/sinatra/issues/1035#issue-106346762
As this is an issue in Puma and they're not fixing it, is there anything we can do on this repo? Faye::WebSocket schedules its work on the EventMachine reactor, which should be using different threads from Puma, so I'm not sure why this blocking effect is happening. Does anyone have more information?
Same issue here. Experiencing delay up to ping time if ping is defined. No delay if ping is not set in options.
Using puma 5.6.4
I am facing the exact same issue as well, no ping defined, no delay, but when I defined a ping, the delay happens, although it is really not consistent because sometimes there is also no delay. This ping is needed when using a load balancer with an idle timeout.
Also using Puma 5.6.4, but the version does not seem to matter, I also had the issue with 4.3.4.
I see some suggestions to handle a custom ping implementation on the client side, but that doesn't look very clean to me with Javascript. Any idea how to handle that on the server side?
I'm not actually able to reproduce this. Running puma v5.6.4, I have tried adding the following to examples/app.rb
before the onmessage
handler to make the server keep sending messages:
EM.add_periodic_timer(1) do
ws.send('periodic message') if ws
end
Running the server with ruby examples/server.rb 7000 '' puma
, and then the client with ruby examples/client.rb 'ws://0.0.0.0:7000/'
, the messages show up on the client as expected, even though the server has ping: 5
which would delay the messages it's sending according to this report. This makes me feel like the source of latency isn't in this setup itself but in some other resource that your application, Puma, and its thread pool are competing over.
If it can't be fixed, probably the best thing is to add application-level ping either from the server or client. Many application protocols need to do this anyway because their logic is independent of the WebSocket protocol. The WebSocket ping is just a best-effort mechanism to keep the connection alive, so it's not disconnected due to lack of TCP activity.
I'm not actually able to reproduce this. Running puma v5.6.4, I have tried adding the following to
examples/app.rb
before theonmessage
handler to make the server keep sending messages:EM.add_periodic_timer(1) do ws.send('periodic message') if ws end
Running the server with
ruby examples/server.rb 7000 '' puma
, and then the client withruby examples/client.rb 'ws://0.0.0.0:7000/'
, the messages show up on the client as expected, even though the server hasping: 5
which would delay the messages it's sending according to this report. This makes me feel like the source of latency isn't in this setup itself but in some other resource that your application, Puma, and its thread pool are competing over.If it can't be fixed, probably the best thing is to add application-level ping either from the server or client. Many application protocols need to do this anyway because their logic is independent of the WebSocket protocol. The WebSocket ping is just a best-effort mechanism to keep the connection alive, so it's not disconnected due to lack of TCP activity.
I have made a similar test than you (on my running application though) without the ping option and adding:
EM.add_periodic_timer(5) do
websocket.send('ping') # websocket.ping('ping') has same effect
end
Same behaviour, all messages are sent right after the time the ping is sent, and it happens consistently. No delay happens if I remove this custom ping.
Using faye v 0.11.1 Here is my Gemfile.lock if it can help:
GEM
remote: https://rubygems.org/
specs:
activemodel (6.0.4.8)
activesupport (= 6.0.4.8)
activerecord (6.0.4.8)
activemodel (= 6.0.4.8)
activesupport (= 6.0.4.8)
activesupport (6.0.4.8)
concurrent-ruby (~> 1.0, >= 1.0.2)
i18n (>= 0.7, < 2)
minitest (~> 5.1)
tzinfo (~> 1.1)
zeitwerk (~> 2.2, >= 2.2.2)
addressable (2.8.0)
public_suffix (>= 2.0.2, < 5.0)
ast (2.4.2)
aws-eventstream (1.2.0)
aws-partitions (1.584.0)
aws-sdk-core (3.130.2)
aws-eventstream (~> 1, >= 1.0.2)
aws-partitions (~> 1, >= 1.525.0)
aws-sigv4 (~> 1.1)
jmespath (~> 1.0)
aws-sdk-kms (1.56.0)
aws-sdk-core (~> 3, >= 3.127.0)
aws-sigv4 (~> 1.1)
aws-sdk-s3 (1.114.0)
aws-sdk-core (~> 3, >= 3.127.0)
aws-sdk-kms (~> 1)
aws-sigv4 (~> 1.4)
aws-sdk-sqs (1.24.0)
aws-sdk-core (~> 3, >= 3.71.0)
aws-sigv4 (~> 1.1)
aws-sigv4 (1.5.0)
aws-eventstream (~> 1, >= 1.0.2)
brakeman (5.2.3)
builder (3.2.4)
bundler-audit (0.6.1)
bundler (>= 1.2.0, < 3)
thor (~> 0.18)
byebug (11.1.3)
coderay (1.1.3)
concurrent-ruby (1.1.10)
crack (0.4.5)
rexml
database_cleaner (2.0.1)
database_cleaner-active_record (~> 2.0.0)
database_cleaner-active_record (2.0.1)
activerecord (>= 5.a)
database_cleaner-core (~> 2.0.0)
database_cleaner-core (2.0.1)
diff-lcs (1.5.0)
docile (1.4.0)
domain_name (0.5.20190701)
unf (>= 0.0.5, < 1.0.0)
dry-configurable (0.15.0)
concurrent-ruby (~> 1.0)
dry-core (~> 0.6)
dry-container (0.9.0)
concurrent-ruby (~> 1.0)
dry-configurable (~> 0.13, >= 0.13.0)
dry-core (0.7.1)
concurrent-ruby (~> 1.0)
dry-inflector (0.2.1)
dry-logic (1.2.0)
concurrent-ruby (~> 1.0)
dry-core (~> 0.5, >= 0.5)
dry-types (1.5.1)
concurrent-ruby (~> 1.0)
dry-container (~> 0.3)
dry-core (~> 0.5, >= 0.5)
dry-inflector (~> 0.1, >= 0.1.2)
dry-logic (~> 1.0, >= 1.0.2)
elastic-apm (3.6.0)
concurrent-ruby (~> 1.0)
http (>= 3.0)
eventmachine (1.2.7)
excon (0.92.3)
factory_bot (6.2.1)
activesupport (>= 5.0.0)
faraday (1.0.1)
multipart-post (>= 1.2, < 3)
faye-websocket (0.11.1)
eventmachine (>= 0.12.0)
websocket-driver (>= 0.5.1)
ffi (1.15.5)
formatador (1.1.0)
fuubar (2.5.1)
rspec-core (~> 3.0)
ruby-progressbar (~> 1.4)
grape (1.3.3)
activesupport
builder
dry-types (>= 1.1)
mustermann-grape (~> 1.0.0)
rack (>= 1.3.0)
rack-accept
grape-entity (0.8.2)
activesupport (>= 3.0.0)
multi_json (>= 1.3.2)
grape-swagger (1.1.0)
grape (~> 1.3.0)
grape-swagger-entity (0.4.0)
grape-entity (>= 0.5.0)
grape-swagger (>= 1.0.0)
guard (2.18.0)
formatador (>= 0.2.4)
listen (>= 2.7, < 4.0)
lumberjack (>= 1.0.12, < 2.0)
nenv (~> 0.1)
notiffany (~> 0.0)
pry (>= 0.13.0)
shellany (~> 0.0)
thor (>= 0.18.1)
guard-compat (1.2.1)
guard-rspec (4.7.3)
guard (~> 2.1)
guard-compat (~> 1.1)
rspec (>= 2.99.0, < 4.0)
hashdiff (1.0.1)
http (4.1.1)
addressable (~> 2.3)
http-cookie (~> 1.0)
http-form_data (~> 2.0)
http_parser.rb (~> 0.6.0)
http-cookie (1.0.4)
domain_name (~> 0.5)
http-form_data (2.3.0)
http_parser.rb (0.6.0)
i18n (0.9.5)
concurrent-ruby (~> 1.0)
jmespath (1.6.1)
json (2.6.1)
listen (3.7.1)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
lumberjack (1.2.8)
mailchimp-api (2.0.7)
excon (>= 0.16.0)
json (>= 1.7.7)
method_source (1.0.0)
mime-types (3.4.1)
mime-types-data (~> 3.2015)
mime-types-data (3.2022.0105)
minitest (5.15.0)
multi_json (1.15.0)
multipart-post (2.1.1)
mustermann (1.1.1)
ruby2_keywords (~> 0.0.1)
mustermann-grape (1.0.2)
mustermann (>= 1.0.0)
mysql2 (0.5.4)
nenv (0.3.0)
nio4r (2.5.8)
notiffany (0.1.3)
nenv (~> 0.1)
shellany (~> 0.0)
parallel (1.22.1)
parser (3.1.2.0)
ast (~> 2.4.1)
pry (0.13.1)
coderay (~> 1.1)
method_source (~> 1.0)
pry-byebug (3.9.0)
byebug (~> 11.0)
pry (~> 0.13.0)
public_suffix (4.0.7)
puma (5.6.4)
nio4r (~> 2.0)
rack (2.2.3)
rack-accept (0.4.5)
rack (>= 0.4)
rack-cors (1.1.1)
rack (>= 2.0.0)
rack-test (1.1.0)
rack (>= 1.0, < 3)
rainbow (3.1.1)
rake (13.0.6)
rb-fsevent (0.11.1)
rb-inotify (0.10.1)
ffi (~> 1.0)
redis (4.6.0)
regexp_parser (2.3.1)
require_all (1.3.3)
rerun (0.13.1)
listen (~> 3.0)
rexml (3.2.5)
rspec (3.11.0)
rspec-core (~> 3.11.0)
rspec-expectations (~> 3.11.0)
rspec-mocks (~> 3.11.0)
rspec-core (3.11.0)
rspec-support (~> 3.11.0)
rspec-expectations (3.11.0)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.11.0)
rspec-mocks (3.11.1)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.11.0)
rspec-support (3.11.0)
rubocop (1.28.2)
parallel (~> 1.10)
parser (>= 3.1.0.0)
rainbow (>= 2.2.2, < 4.0)
regexp_parser (>= 1.8, < 3.0)
rexml
rubocop-ast (>= 1.17.0, < 2.0)
ruby-progressbar (~> 1.7)
unicode-display_width (>= 1.4.0, < 3.0)
rubocop-ast (1.17.0)
parser (>= 3.1.1.0)
ruby-progressbar (1.11.0)
ruby2_keywords (0.0.5)
sentry-raven (3.1.2)
faraday (>= 1.0)
sequel (5.31.0)
shellany (0.0.1)
simplecov (0.21.2)
docile (~> 1.1)
simplecov-html (~> 0.11)
simplecov_json_formatter (~> 0.1)
simplecov-html (0.12.3)
simplecov_json_formatter (0.1.4)
string-urlize (1.0.2)
i18n (~> 0.5)
thor (0.20.3)
thread_safe (0.3.6)
tzinfo (1.2.9)
thread_safe (~> 0.1)
unf (0.1.4)
unf_ext
unf_ext (0.0.8.1)
unicode-display_width (2.1.0)
uuidtools (2.2.0)
webmock (3.14.0)
addressable (>= 2.8.0)
crack (>= 0.3.2)
hashdiff (>= 0.4.0, < 2.0.0)
websocket-driver (0.7.5)
websocket-extensions (>= 0.1.0)
websocket-extensions (0.1.5)
zeitwerk (2.5.4)
PLATFORMS
ruby
DEPENDENCIES
aws-sdk-s3
aws-sdk-sqs
brakeman
bundler-audit (~> 0.6.1)
database_cleaner
elastic-apm (~> 3.6.0)
factory_bot
faraday
faye-websocket
fuubar
grape (~> 1.3.2)
grape-entity (~> 0.8.0)
grape-swagger (~> 1.1.0)
grape-swagger-entity (~> 0.4.0)
guard-rspec
http (~> 4.1.1)
mailchimp-api
mime-types
mysql2 (~> 0.5.3)
pry-byebug
puma (~> 5.6.4)
rack (~> 2.2.3)
rack-cors
rack-test
rake
redis
require_all (~> 1.3.3)
rerun
rspec
rubocop
sentry-raven
sequel (~> 5.31.0)
simplecov
string-urlize
uuidtools
webmock
RUBY VERSION
ruby 2.7.1p83
BUNDLED WITH
2.1.4
@BastienBabar To have a chance of investigating this I'd need a small example application that demonstrates the problem happening, containing just enough code and dependencies to show the problem and nothing more. Putting this together will probably help you narrow down which libraries are contributing to the effect you're observing.
Alright, thanks to taking some time setting up a small example, I guess the problem comes from Redis (in my case). Here is my example:
# frozen_string_literal: true
require 'rubygems'
require 'active_support'
Bundler.require
require 'grape'
class Base < Grape::API
end
class Chat
attr_reader :env
def initialize(app)
@app = app
@env = nil
end
def call(env)
@env = env
if Faye::WebSocket.websocket?(env)
websocket = Faye::WebSocket.new(env, nil, { ping: 50 })
websocket.on :open do |_event|
puts 'connected'
subscribe_channel('test', websocket)
end
websocket.on :message do |event|
puts "#{Time.now} message #{event.data}"
redis_connection.publish('test', event.data)
rescue StandardError
websocket.close
end
websocket.on :close do |event|
p [:close, websocket.object_id, event.code, event.reason]
websocket = nil
end
websocket.on :error do |event|
p [:error, websocket.object_id, event.code, event.reason]
websocket.close
end
websocket.rack_response
else
@app.call(env)
end
end
def redis_connection
uri = URI.parse('redis://localhost:6379')
Redis.new(host: uri.host, port: uri.port, password: uri.password)
end
def subscribe_channel(channel, websocket)
Thread.new do
redis_connection.subscribe(channel) do |on|
on.subscribe do |ch, subscriptions|
puts "Subscribed to ##{ch} (#{subscriptions} subscriptions)"
end
on.message do |ch, message|
puts "#{Time.now} send message: ##{ch}: #{message} \n"
websocket.send(message)
end
on.unsubscribe do |ch, subscriptions|
puts "Unsubscribed from ##{ch} (#{subscriptions} subscriptions)"
end
end
end
end
end
use Chat
run Base
with Gemfile
source 'https://rubygems.org'
ruby '2.7.1'
gem 'grape', '~> 1.3.2'
gem 'puma', '~> 5.6.4'
gem 'rack', '~> 2.2.3'
gem 'faye-websocket'
gem 'redis'
Messages are being sent every 50 seconds in this case. But, if you remove the redis implementation, no more delay. Am I doing something wrong here?
The problem with this code is that in subscribe_channel
you're calling websocket.send(message)
from a non-EventMachine thread. This means the input gets queue but not actually written to the socket until the next time something interacts with it, for example the ping
loop. If you replace this call with the following then you should observe messages being delivered without delay:
EM.next_tick { websocket.send(message) }
@jcoglan Thanks a lot for this, that does the trick! Though, this is really not straightforward to understand when not familiar with EventMachine, because when I switch to a Thin server, there is no delay happening no matter what. I assume this is related to the "won't fix" issue in puma mentioned before, but maybe a little warning with the puma config would allow to close this issue? I am also very surprised no one has the same problem, puma/faye/redis (and a load balancer with idle timeout) seems to be a very standard combo to use websocket with ruby, or am I missing something?
I don't fully understand the difference either other than this will cause a change in how work is distributed across threads, in particular the work of writing data to the TCP socket. This is one of the ways EventMachine unfortunately "leaks" and you need to be aware of it if you're running any EM-based libraries.
I'm not sure there's a general purpose advisory I could add to the docs because this is very dependent on what exactly your application is doing.
Using this gem with puma and a very simple rack middleware, I was noticing up to 2 seconds of latency between when the server logged that a WS message had been sent and when a browser actually logged receiving it. This can't be due to network latency, because I'm running this on my local computer and I still see the latency.
However, when I switch thin in, the latency is gone, and messages are received within a few ms. How can I go about troubleshooting this issue with puma? Any recommendations?