benluteijn / cherokee

Automatically exported from code.google.com/p/cherokee
0 stars 1 forks source link

cherokee-worker: Graceful restart fails because of dead connections #237

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When cherokee-worker is working for a long time, some connections keep
unclosed, so a graceful restart can't be performed... It is waiting for
dead connections forever.

This is an example. I have this dead connections opened sice two days ago:

# lsof -n -p 3243 | grep FIN_WAIT2
cherokee- 3243 www-data   22u  IPv4          101371722               TCP
91.121.XXX.XXX:www->89.122.XXX.125:62737 (FIN_WAIT2)
cherokee- 3243 www-data   23u  IPv4          101378847               TCP
91.121.XXX.XXX:www->89.122.XXX.125:51479 (FIN_WAIT2)
cherokee- 3243 www-data   24u  IPv4          101373896               TCP
91.121.XXX.XXX:www->89.122.XXX.125:64256 (FIN_WAIT2)
cherokee- 3243 www-data   25u  IPv4          101382111               TCP
91.121.XXX.XXX:www->89.122.XXX.125:54288 (FIN_WAIT2)

Original issue reported on code.google.com by skar...@gmail.com on 21 Nov 2008 at 12:01

GoogleCodeExporter commented 9 years ago
This could be the issue for another big site that gets a terrible high load 
over time.

Original comment by ste...@konink.de on 21 Nov 2008 at 12:04

GoogleCodeExporter commented 9 years ago
Nod. It is definitely a high priority bug.

Cherokee has been using lingering close for quite a long time now. That's the 
mechanism that is meant to 
prevent this issue, so it seems that we are facing a bug rather than a missing 
feature or anything else.

The first step -and most difficult part of the fixing- would be to reproduce 
the problem.  Antonio, do you know 
some way in which I could push a server to leave that sort of lingering 
connections?

Original comment by alobbs on 21 Nov 2008 at 12:15

GoogleCodeExporter commented 9 years ago
Álvaro, I don't know, sorry... I was giving a look to my logs searching IPs of 
those
dead connections, but I didn't see nothing special. :(

Original comment by skar...@gmail.com on 21 Nov 2008 at 12:31

GoogleCodeExporter commented 9 years ago
I know 'the bug' exists in 0.6.1b1373. Because that site restarts Cherokee every
night in a hard way to prevent this thing.

Original comment by ste...@konink.de on 21 Nov 2008 at 12:34

GoogleCodeExporter commented 9 years ago
Yep. I'm afraid we will have a whole lot of fun tracking down this issue.. 

Original comment by alobbs on 21 Nov 2008 at 12:49

GoogleCodeExporter commented 9 years ago
Could we emulate this behavior by any network swiss knife that keeps the 
connection
open? Or are you looking for something else?

Original comment by ste...@konink.de on 21 Nov 2008 at 12:50

GoogleCodeExporter commented 9 years ago
Whatever lead us to reproduce the issue would fit.. programs, home-made 
scripts, and even swiss knifes could 
be an option :-)

Do you know some utility that could help us to reproduce this bug?

Original comment by alobbs on 21 Nov 2008 at 1:07

GoogleCodeExporter commented 9 years ago
If you tell me *what* situation you want to get Cherokee in... I can think of 
something.

Original comment by ste...@konink.de on 21 Nov 2008 at 1:08

GoogleCodeExporter commented 9 years ago
Mmmm... I've been searching again in logs and I've found that those IPs are 
"bad"
crawlers... :-/

89.122.XXX.122 - - [21/Nov/2008:10:49:00 +0000] "GET / HTTP/1.1" 200 5396 "-"
"Java/1.6.0_04"

89.122.XXX.125 - - [21/Nov/2008:08:27:17 +0000] "GET / HTTP/1.1" 200 34946 "-"
"Java/1.6.0_04"

212.156.XXX.106 - - [21/Nov/2008:12:59:19 +0000] "GET
/2006/05/30/a-por-el-adsl-capitulo-v/ HTTP/1.0" 200 7486 "-" "beast/Nutch-0.9
(agentspider; beast@mail.com)"

Original comment by skar...@gmail.com on 21 Nov 2008 at 1:11

GoogleCodeExporter commented 9 years ago
Buggy clients that perhaps can help to catch this bug:

    * Mozilla/3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
    * Mozilla/2.02 (X11; I; FreeBSD 2.1.5-RELEASE i386)
    * Mozilla/3.01Gold (X11; I; SunOS 5.5 sun4m)
    * MSIE 3.01 on the Macintosh
    * MSIE 3.01 on Windows 95

From: http://httpd.apache.org/docs/1.3/misc/fin_wait_2.html

Original comment by skar...@gmail.com on 21 Nov 2008 at 1:15

GoogleCodeExporter commented 9 years ago
can this be solved to drop the connection after a timeout or is this too low 
level?

Original comment by ste...@konink.de on 21 Nov 2008 at 1:17

GoogleCodeExporter commented 9 years ago
I'm using Linux with a 60sec. timeout, so it is not working in this case:

# cat /proc/sys/net/ipv4/tcp_fin_timeout
60

Original comment by skar...@gmail.com on 21 Nov 2008 at 1:22

GoogleCodeExporter commented 9 years ago
Could you please test r2446?
I have just committed something that might fix it up.

Original comment by alobbs on 21 Nov 2008 at 2:55

GoogleCodeExporter commented 9 years ago
One hour without problems using r2447.

Seems that the patch is good!. Thank you very much! ;)

Original comment by skar...@gmail.com on 21 Nov 2008 at 8:04

GoogleCodeExporter commented 9 years ago
Great! :-)

Original comment by alobbs on 21 Nov 2008 at 8:37