celluloid / reel

UNMAINTAINED: See celluloid/celluloid#779 - Celluloid::IO-powered web server
https://celluloid.io
MIT License
596 stars 87 forks source link

Connections persisting, after hijack_io enabled handler. #51

Closed digitalextremist closed 11 years ago

digitalextremist commented 11 years ago

I went to recycle my Reel handler and it said the port was already bound. I checked for any reel process, or any ruby processes for that matter and there were none.

Then I ran netstat -a | grep ":www" > out and was horrified to see as many connections as I have designated workers (eventually). For example:

http://decentrality.com/connections.025-001.log http://decentrality.com/connections.025-002.log http://decentrality.com/connections.025-003.log http://decentrality.com/connections.025-004.log

That is watching a very short time period, with four checks for connections.

Very interesting, after recycling Reel (which evidently was allowed this time):

http://decentrality.com/connections.025-005-recycled.log

Note the change to LAST_ACK & TIME_WAIT ... but all those sockets persisting, even after the process was killed and restarted fresh.

After a long time: http://decentrality.com/connections.025-006.log So something gets freed up eventually, but it seems like it's the OS doing it.

I recycled again: http://decentrality.com/connections.025-007-recycled.log ... now there is even FIN_WAIT1 which does seem to be the OS on a timer.

I can try to isolate this and remedy it, but can someone please try to duplicate this with @penultimatix's reel or @halorgium's latest hijack_io branch:

https://github.com/halorgium/reel/tree/hijacked-websocket

digitalextremist commented 11 years ago

Retaining more, later on: http://decentrality.com/connections.025-008.log

digitalextremist commented 11 years ago

http://decentrality.com/connections.025-009.log

# cat connections.025-009.log | wc -l
134

My :worker option is set to 153... and now I am noticing that ssh connections lag for extreme periods, but http traffic continues, slower but still going strong. Reset the network interface - no change.

digitalextremist commented 11 years ago

I am manipulating socket options effectively to work with this, and do see there is an operating system and/or TCP behavior Reel is working through/with/against. I will close this temporarily and reopen it if I find the problem is unsolvable by me, otherwise I will post a pull request with an optimization to improve performance, rather than to 'solve a problem.' I am not sure any more, even though the behavior seems extreme, that it is a Reel 'bug.'