bitpiston / oyster

A Perl web application framework.
Other
3 stars 1 forks source link

fcgi processes crashing and failing to respawn #48

Closed einkoro closed 11 years ago

einkoro commented 12 years ago

I haven't pinpointed the reason for this (if its even oyster, spawn-fcgi, lighttpd or what ever) but every now and then the processes will have died with nothing of interest in the oyster logs or lighttpd logs. Trying to restart the process with fcgictl / spawn-fcgi just complains it immediately exited with status 2 until you run script/perm.pl again. Using diff to compare the permissions and ownership of all oyster files before and after running perm.pl shows no difference what so ever so this is just plain strange to me.

All systems are running FreeBSD 9.0-RELEASE-p4 - bitpiston.com is on the latest lighttpd 1.4.x release which uses mod_fastcgi and the others are lighttpd 1.5 compiled from trunk which uses mod_proxy_core.

fcgi-debug might be the best way to track this down but it doesn't have a port for FreeBSD available: http://redmine.lighttpd.net/projects/fcgi-debug/wiki

One theory was it was dying under heavy load but this throws the expected 503 service unavailable. Another theory was that it choked on some unexpected input processing requests but throwing all sorts of garbage at it doesn't appear to reproduce the issue either.

Edit: I have a feeling this has always been around but we never saw it since lighttpd used to respawn processes when the backend died. The time I first noticed this was on kosh shortly after lighttpd 1.4.x started using spawn-fcgi. Could it just be some sort of max requests issue before a process dies? I know PHP has a max requests but nothing for perl that I'm familiar with. It would explain why this came up quickly hammering the server during stress testing but only once or twice a month on bitpiston.com

einkoro commented 12 years ago

Hi Jan,

Sorry for the late response. I haven't had this particular issue. I would suggest trying spawning without daemonization to see if you would get any error messages. Unfortunately spawn-fcgi junks standard output/error and until your app reinstates them you don't get any output from it.

Most likely issues coming to mind would be environment (variable) differences and current directory.

Oleg

On Sun, Jun 10, 2012 at 9:13 PM, Jan Pingel wrote:

Hi Oleg,

Have you ever encountered an issue where using the rc.d script under freebsd would fail to spawn the process but directly using fcgictl works fine? I'm a bit mystified by this one considering the rc.d script calls the exact same command and arguments and the rc.d script is +x.

kosh# service fcgiapps start spawn-fcgi: child exited with: 2 ExecutionError: # kosh# /usr/local/bin/fcgictl all start spawn-fcgi: child spawned successfully: PID: 24879 spawn-fcgi: child spawned successfully: PID: 24881

einkoro commented 12 years ago

Currently using supervisord as a replacement for fcgictl, spawn-fcgi and monit. It hasn't failed to restart the process in artificial crashes – time will tell if the permission weirdness shows up.

[fcgi-program:bitpiston]
socket          = unix:///var/run/chowder/%(program_name)s.sock
socket_mode     = 0766
process_name    = %(program_name)s_%(process_num)s
command         = /home/bitpiston/chowder/shared/oyster.fcgi
directory       = /home/bitpiston/chowder/shared 
numprocs        = 3
user            = www
environment     = oyster_site_id=bitpiston
redirect_stderr = true
priority        = 500
startsecs       = 3 
startretries    = 10
autorestart     = true 

http://supervisord.org/

The FreeBSD port is sys-util/py-supervisor

einkoro commented 12 years ago

Hammering the server 'randomly' reproduced the crash twice and it respawned successfully. 'autostart = true' was required to ignore exit codes.

Still haven't narrowed down what causes it.

einkoro commented 12 years ago

100k requests and overloading the processes and no crash this time around.

einkoro commented 11 years ago

Looks like I pinpointed the issue or at least another issue that could cause this. Referrer in 404 pages is not XML entity friendly and depending on the referrer URL can cause parsing to choke.