faye / faye-websocket-ruby

Standards-compliant WebSocket client and server
Other
1.03k stars 97 forks source link

EM keeps shutdown when server overloads. #125

Open martinbarilik opened 4 years ago

martinbarilik commented 4 years ago

Hi there again. We are using this gem to create client, but we can't figure out why client shutdowns EM immediately if server overloads out client with lots of messages. It happens sporadically once a day or two.

We create client as follows:

Thread.abort_on_exception = true
Thread.new {
  EM.run {
    ws = Faye::WebSocket::Client.new('ws://www.example.com/')

    ws.on :open do |event|
      p [:open]
      ws_logger.info "HELLO!"
    end

    ws.on :message do |event|
      ws_logger.info event.data
    end

    ws.on :close do |event|
      ws_logger.info "#{Time.now} code: #{event.code}, reason: #{event.reason}"
      ws = nil
    end
  }
  EM.add_shutdown_hook { ws_logger.info("#{Time.now} EM SHUTDOWN HOOK") }
}

We need to run client socket in dedicated thread so it won't block anything. Problem starts when server overloads our client with messages, we do not see the messages after overload in ws_logger file, but we have this:

[some time] EM SHUTDOWN HOOK
[some time] code: 1006, reason: 

The fact that EM SHUTDOWN HOOK is first and the socket closes with 1006 tells us, something is wrong. Maybe server sents very big bulk of messages client can't process ?

We have to be able to recover from 1006 and restart web-socket to be running again, but with EM being shutdown, recovery code is killed with it and software stops.

I look some info around sockets, maybe we need to set max_length option to client ? But it buggers me that The default value is 2^26 - 1, or 1 byte short of 64 MiB is not enough ?

I do not know is this is relevant, but i even looked at default buffer size in out machine according to this post https://stackoverflow.com/questions/7865069/how-to-find-the-socket-buffer-size-of-linux.

/proc/sys/net/ipv4/tcp_rmem (for read) // prints 4096    131072  6291456
/proc/sys/net/ipv4/tcp_wmem (for write) // prints 4096    16384   4194304

One more thing, can't figure out what is the difference between this gem and websocket-driver-ruby. Isn't the websocket-driver-ruby more suitable for us ?

If you need more info, ask for it.

Thanks in advance.

jcoglan commented 4 years ago

Can you add this listener to your socket to see what error it prints when the socket closes?

ws.on :error do |error|
  # ...
end
martinbarilik commented 4 years ago

I have added it, waiting for server to overload the client.

martinbarilik commented 4 years ago

.on :error listener doesn't seem to catch anything, same old messages

[some time] EM SHUTDOWN HOOK
[some time] code: 1006, reason: 
jcoglan commented 4 years ago

Normally, the close event only triggers with code 1006 when the underlying TCP socket is disconnected without a WebSocket close message being received, and the code that detects this disconnection triggers an error event first. The only other time this happens is if EventMachine itself is stopped, which you're detecting here.

Why are you stopping EventMachine, and why are you wrapping it in your own threads?

martinbarilik commented 4 years ago

I am working on a project which depends on synchronous code but is using EM and websocket in it. We changed a code a bit, for now we use thread to only start EM so it won't break the main thread. We had some errors in functionality that caused an errors but they were never propagated to EM, because we used own Threads. We rewrote it to use EM.defer, for now it's working but we need to test it more.

If there is any "best-way-to" start EM on server start ( like in rack, or cron or something ) so everytime user wants to start its websocket EM will already be running, i am listening.

Anyway, thanks for your time.

jcoglan commented 4 years ago

Yeah, mixing synchronous code with EM is painful. This package uses this function to start the reactor: https://github.com/faye/faye-websocket-ruby/blob/master/lib/faye/websocket.rb#L39-L42. Don't call that, it's not part of the public API -- write your own version of it.