celluloid / reel

UNMAINTAINED: See celluloid/celluloid#779 - Celluloid::IO-powered web server
https://celluloid.io
MIT License
595 stars 87 forks source link

Somehow server is blocking after 10 persistent connections #30

Closed ghost closed 11 years ago

ghost commented 11 years ago

I'm a bit confused... Could be my flu that does not let me think clear...

Using my own example: https://github.com/celluloid/reel/blob/master/examples/server-sent-events.rb

After doing 10 persistent connections to /subscribe, the app is completely blocked, that's it, does not receive any connections on any route.

Any thoughts?

ghost commented 11 years ago

@tarcieri, do i missing something?

Also, websockets streams are "pinged" automatically every N seconds. I'm thinking about same for SSE streams, so when a socket dies a user defined error handler called.

Here is the simplest implementation: https://github.com/slivu/reel/blob/master/lib/reel/stream.rb#L25

Perhaps better ideas?

tarcieri commented 11 years ago

On Tue, Feb 12, 2013 at 6:28 AM, slivu notifications@github.com wrote:

After doing 10 persistent connections to /subscribe, the app is completely blocked, that's it, does not receive any connections on any route.

Any thoughts?

How are you testing it? 10 is an awfully specific number. Are you sure you're not hitting any client-side limits?

Tony Arcieri

ghost commented 11 years ago

yep, that 10 is totally weird! wonder how could this happen on 2 diff systems(OSX and Linux) regardless used client.

web browsers has own limit(about 5) on persistent connections, so opened 4 connections in chrome, 4 in safari and firefox was able to establish only 2 connections. next ones are just hanging, and when closing a tab in chrome or safari, a hanging connection from firefox are established.

also tried with curl - stupidly opened 11 terminal tabs - 10 connections went ok, 11th is hanging forever, until some previous connection are closed.

thought could be nginx, and tried directly, through port, and got same weird 10 connections.

OS setup/limitation? sysctl -a | grep 10 returned nothing relevant.

not sure how this happen - Halloween already passed and April 1 is yet far...

ghost commented 11 years ago

also tried with server on one machine and clients on another. both through nginx and through port.

shtirlic commented 11 years ago

Tried this example and sometimes getting this error: testing with curl and browser mri latest

E, [2013-02-12T23:31:32.673231 #89380] ERROR -- : Reel::Server crashed!
NoMethodError: undefined method `gsub' for 2013-02-12 23:31:32 +0400:Time
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/reel-0.3.0/lib/reel/stream.rb:70:in `data'
reel_test.rb:28:in `block (4 levels) in <main>'
reel_test.rb:28:in `each'
reel_test.rb:28:in `block (3 levels) in <main>'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/rack-1.5.2/lib/rack/builder.rb:138:in `call'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/rack-1.5.2/lib/rack/builder.rb:138:in `call'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/rack-1.5.2/lib/rack/urlmap.rb:65:in `block in call'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/rack-1.5.2/lib/rack/urlmap.rb:50:in `each'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/rack-1.5.2/lib/rack/urlmap.rb:50:in `call'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/reel-0.3.0/lib/reel/rack_worker.rb:73:in `handle_request'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/reel-0.3.0/lib/reel/rack_worker.rb:65:in `handle'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23:in `public_send'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23:in `dispatch'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/celluloid-0.12.4/lib/celluloid/actor.rb:327:in `block in handle_message'
/Users/shtirlic/.rvm/gems/ruby-1.9.3-p385/gems/celluloid-0.12.4/lib/celluloid/tasks/task_fiber.rb:24:in `block in initialize'

as @slivu suggested add to_s to data in stream

ghost commented 11 years ago

for now update the example like this please:

https://github.com/slivu/reel/blob/master/examples/server-sent-events.rb#L27

later we will fix stream.rb like this:

https://github.com/slivu/reel/blob/master/lib/reel/stream.rb#L71

ghost commented 11 years ago

btw, tried on rbx with same success

shtirlic commented 11 years ago

github code search rulez https://github.com/search?l=&q=10++repo%3Acelluloid%2Freel&ref=advsearch&type=Code

tarcieri commented 11 years ago

I see, well clearly this is hardcoded into the Rack adapter. I didn't write that and unfortunately don't know a whole lot about it, but this appears to the cause of the problem

On Tue, Feb 12, 2013 at 12:27 PM, Serg Podtynnyi notifications@github.comwrote:

github code search rulez

https://github.com/search?l=&q=10++repo%3Acelluloid%2Freel&ref=advsearch&type=Code

— Reply to this email directly or view it on GitHubhttps://github.com/celluloid/reel/issues/30#issuecomment-13455435.

Tony Arcieri

ghost commented 11 years ago

confirming that was it. thank you!

tarcieri commented 11 years ago

Hmm, just noticed this:

https://github.com/celluloid/reel/blob/master/examples/server-sent-events.rb#L5

Seems bad. It's being accessed in an unsafe manner.

On Tue, Feb 12, 2013 at 6:28 AM, slivu notifications@github.com wrote:

I'm a bit confused... Could be my flu that does not let me think clear...

Using my own example:

https://github.com/celluloid/reel/blob/master/examples/server-sent-events.rb

After doing 10 persistent connections to /subscribe, the app is completely blocked, that's it, does not receive any connections on any route.

Any thoughts?

— Reply to this email directly or view it on GitHubhttps://github.com/celluloid/reel/issues/30.

Tony Arcieri

ghost commented 11 years ago

we need somehow to keep a list of established persistent connections and send/read messages to/from them at any(later) time.

any safer way to keep that list?

ghost commented 11 years ago

@tarcieri, do you mean something like this?

Connections = Class.new(Array) do
  include Celluloid

  def << conn
    exclusive { super }
  end
  def delete conn
    exclusive { super }
  end
end.new

# ...

Connections << socket
# ...
Connections.delete socket
zacksiri commented 11 years ago

Shouldn't you do that via something like redis?

On Feb 13, 2013, at 3:30 PM, slivu notifications@github.com wrote:

we need somehow to keep a list of established persistent connections and send/read messages to/from them at any(later) time.

any safer way to keep that list?

— Reply to this email directly or view it on GitHub.

ghost commented 11 years ago

well, not sure it worth to use redis on a quick-show example

ghost commented 11 years ago

@tarcieri, i'm continuing my crash-test, so a new portion of "weirdness" here :)

first of all, i updated handler to create a pool per each connection rather than use a fixed size pool: https://github.com/slivu/reel/commit/0f4d0c7da it is quick-hack, i guess the best way would be to use a fixed size pool for non-persistent connections and generate a pool for each persistent. will review this later.

now, the other issue arises - both mri and rbx core dumping after 1000+ persistent connections :) it vary from 1002 to 1010 but always core dump after this.

Testing on Ubuntu 12.04 x64, 8 core CPU(i7) with 8GB RAM. Using this dully flooding script:

for n in `seq 5000`; do
  echo $n
  curl localhost:9292/subscribe &
  sleep 0.05
done

This is what gdb says:

[New LWP 15186]
[New LWP 15191]
[New LWP 15391]
[New LWP 15196]
[New LWP 15386]
[New LWP 15381]
[New LWP 15292]
[New LWP 15376]
[New LWP 15302]
[New LWP 15368]
[New LWP 15311]
[New LWP 15317]
[New LWP 15367]
[New LWP 15318]
[New LWP 15363]
[New LWP 15321]
[New LWP 15362]
[New LWP 15326]
[New LWP 15352]
Failed to read a valid object file image from memory.
Core was generated by `ruby reel.rb'.
Program terminated with signal 6, Aborted.
#0  0x00007fac411f1425 in ?? ()
(gdb) 

Thought could be because of unsafe way to update Connections list, so tested as well with Connections = Class.new(Array) do i sent above. Same result.

Then removed that part at all, so subscribe route does not update Connections at all:

body = Reel::EventStream.new do |socket|
      # Connections << socket
      # socket.on_error { Connections.delete socket }
end
[200, {'Content-Type' => 'text/event-stream'}, body]

Same result.

Next i thought it could be because it creates a pool per each connection and tried with a fixed size pool, by creating a pool with 5000 workers. Same result.

Any clues?

zacksiri commented 11 years ago

I remember running into some issue similar to this before, something like I had to open a different browser to allow more connections. Maybe the limit is on the browser?

zacksiri commented 11 years ago

Oh shit sorry u already ruled that out just read the whole discussion.

zacksiri commented 11 years ago

Have u tried this with webmachine and see if the problem persists?

ghost commented 11 years ago

good point, trying webmachine

ghost commented 11 years ago

@zacksiri, any ideas how to use SSE with webmachine/Reel?

@tarcieri, just tried MRI 2.0.0rc2 Same result - core dump after 1000+ connections.

tarcieri commented 11 years ago

If you're going to try something else to avoid segfaults, you should try JRuby. The "stable" releases of MRI aren't even stable, let alone 2.0 RCs.

On Wed, Feb 13, 2013 at 9:37 AM, slivu notifications@github.com wrote:

@zacksiri https://github.com/zacksiri, any ideas how to use SSE with webmachine/Reel?

@tarcieri https://github.com/tarcieri, just tried MRI 2.0.0rc2 Same result - core dump after 1000+ connections.

— Reply to this email directly or view it on GitHubhttps://github.com/celluloid/reel/issues/30#issuecomment-13506544.

Tony Arcieri

ghost commented 11 years ago

@tarcieri, it is a hardcoded limitation again :) this time in Ruby itself:

#define __FD_SETSIZE        1024

1024 file descriptors per process...

the only way(known to me at least) to avoid this limit is to use epoll on linux* and kqueue on bsd*

any plans to implement any of this for Celluloid::IO?

ghost commented 11 years ago

Yes, JRuby is stable enough, it carry any number of connections, but it eats lot memory!

tarcieri commented 11 years ago

On Thu, Feb 14, 2013 at 2:52 AM, slivu notifications@github.com wrote:

the only way(known to me at least) to avoid this limit is to use epoll on linux* and kqueue on bsd*

any plans to implement any of this for Celluloid::IO?

Celluloid::IO is built on nio4r which is built on libev and supports epoll/kqueue

Tony Arcieri

ghost commented 11 years ago

sounds great! any hint how to tell nio4r to use epoll instead of select?

tarcieri commented 11 years ago

It should use it automatically depending on a number of factors.

What OS are you on and how many file descriptors are you multiplexing from a single thread?

On Thu, Feb 14, 2013 at 9:40 AM, slivu notifications@github.com wrote:

sounds great! any hint how to tell nio4r to use epoll instead of select?

— Reply to this email directly or view it on GitHubhttps://github.com/celluloid/reel/issues/30#issuecomment-13568161.

Tony Arcieri

ghost commented 11 years ago

running tests on a x64 Linux.

not sure how many descriptors are multiplexed from a single thread.

i updated Ree's Rack handler to create a pool per each connection rather than use a fixed size pool: https://github.com/slivu/reel/commit/0f4d0c7da

however i get segfaults even when i create a fixed size pool with 5000 workers. seems somehow nio4r not informed to use epoll... any way to set this explicitly?

tarcieri commented 11 years ago

On Thu, Feb 14, 2013 at 10:07 AM, slivu notifications@github.com wrote:

seems somehow nio4r not informed to use epoll...

any way to set this explicitly?

No, libev makes the decision about whether to use epoll or not completely automatically.

If you need help debugging why it isn't using epoll when you think it should, I'd suggest asking the libev list:

http://software.schmorp.de/pkg/libev.html

Tony Arcieri

halorgium commented 11 years ago

@tarcieri @slivu I think this is almost the same issue as with #33.

halorgium commented 11 years ago

It isn't the same, but it might be related to celluloid/celluloid-io#52.

tarcieri commented 11 years ago

This issue is defunct due to reel-rack