celluloid / dcell

UNMAINTAINED: See celluloid/celluloid#779 - Actor-based distributed objects in Ruby based on Celluloid and 0MQ
http://celluloid.io
MIT License
595 stars 65 forks source link

Shutdown is broken #35

Closed tpitale closed 11 years ago

tpitale commented 11 years ago

Built a process like the example. When I try to stop the process with something like ctrl-c I see lots of error/warning messages about the supervision group related to NodeManager, et al.

This is a placeholder ticket while I take a look into this. @tarcieri if you have any suggestions as to where to start looking, or what might be wrong, I'd love to hear what you've got.

If I'm not able to fix the problem, I'll leave any information I find on this ticket.

tpitale commented 11 years ago
^CW, [2012-12-07T10:46:44.884855 #18399]  WARN -- : DCell::Server is crashing on initialize too quickly, sleeping for 30 seconds
W, [2012-12-07T10:46:44.886430 #18399]  WARN -- : DCell::NodeManager is crashing on initialize too quickly, sleeping for 30 seconds
tarcieri commented 11 years ago

@tpitale last I looked this was the result of conflicting, out-of-order at_exit handlers for both Celluloid and DCell (specifically Celluloid::ZMQ). The goal of this at_exit handler was to drain 0MQ of messages before terminating, however as things stand it's pretty much deadlocking.

I thought I changed Celluloid::ZMQ not to use an at_exit handler anymore, however there are clearly still issues with it.

tpitale commented 11 years ago

This appears to be fixed in 0.12 prerelease through the removal of celluloid-zmq at_exit handler as @tarcieri said. But 0.10.0 gem has this issue. Maybe a new gem release.

Not sure what the net effect of not calling @context.terminate would be, aside from lost messages in the 0mq ether.