Copterfly / modwsgi

Automatically exported from code.google.com/p/modwsgi
0 stars 0 forks source link

Daemon process listener sockets leaked in parent process on 'graceful' restart. #95

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When using daemon mode, the Apache parent keeps references to the listener 
sockets used by daemon 
processes so that if daemon process is restarted it can pass them to new 
instance of process. Thus have:

$ sudo lsof -p 339 | grep wsgi
httpd   339 root  txt   VREG       14,2   263336  3547047 
/usr/local/apache-2.2.4/modules/mod_wsgi.so
httpd   339 root   11u  unix 0x0208e130      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.0.1.sock
httpd   339 root   12u  unix 0x02544d10      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.0.2.sock
httpd   339 root   13u  unix 0x025450a0      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.0.3.sock

If a 'apachectl restart' is performed, these are all closed off and replaced 
with new sockets specific to the 
incarnation of the Apache parent process configuration.

$ sudo lsof -p 339 | grep wsgihttpd   339 root  txt   VREG       14,2   263336  
3547047 /usr/local/apache-
2.2.4/modules/mod_wsgi.so
httpd   339 root   11u  unix 0x02545c80      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.1.1.sock
httpd   339 root   12u  unix 0x02546990      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.1.2.sock
httpd   339 root   13u  unix 0x025457c0      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.1.3.sock

If however a 'apachectl graceful' is being performed, the old sockets aren't 
being cleaned up an a leak of the 
socket occurs.

$ sudo lsof -p 339 | grep wsgi
httpd   339 root  txt   VREG       14,2   263336  3547047 
/usr/local/apache-2.2.4/modules/mod_wsgi.so
httpd   339 root   11u  unix 0x02545c80      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.1.1.sock
httpd   339 root   12u  unix 0x02546990      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.1.2.sock
httpd   339 root   13u  unix 0x025457c0      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.1.3.sock
httpd   339 root   14u  unix 0x02545300      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.2.1.sock
httpd   339 root   15u  unix 0x02544e40      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.2.2.sock
httpd   339 root   16u  unix 0x025458f0      0t0          
/usr/local/apache-2.2.4/logs/wsgi.339.2.3.sock

The file in the filesystem corresponding to the socket is also left in the file 
system.

The code which is meant to ensure they are closed and cleaned up is:

        /* Apache is being restarted or shutdown. */

        case APR_OC_REASON_RESTART: {

            /* Stop watching the existing process. */

            apr_proc_other_child_unregister(daemon);

            /*
             * Remove socket used for communicating with daemon
             * when the process to be notified is the first in
             * the process group.
             */

            if (daemon->instance == 1) {
                if (close(daemon->group->listener_fd) < 0) {
                    ap_log_error(APLOG_MARK, WSGI_LOG_ERR(errno),
                                 wsgi_server, "mod_wsgi (pid=%d): "
                                 "Couldn't close unix domain socket '%s'.",
                                 getpid(), daemon->group->socket);
                }

                if (unlink(daemon->group->socket) < 0 && errno != ENOENT) {
                    ap_log_error(APLOG_MARK, WSGI_LOG_ERR(errno),
                                 wsgi_server, "mod_wsgi (pid=%d): "
                                 "Couldn't unlink unix domain socket '%s'.",
                                 getpid(), daemon->group->socket);
                }
            }

            break;
        }

For a graceful restart however,  the Apache code is not notifying any other 
child process that a restart is 
occurring and thus not providing an opportunity for the above code to be run. 
Thus the sockets leak on 
graceful restart.

There could be a much bigger problem here though. Although the daemon processes 
are being killed off and 
replaced, there doesn't appear to be evidence in error log file of an orderly 
shutdown of those processes, 
instead they appear to be just getting killed off and not sure how.

Code at least needs to be changed to link closing and unlink of old sockets to 
the pool associated with the 
configuration. That way when Apache is destroying pools for prior configuration 
incarnation, it will cleanup 
sockets automatically at that point.

Need to work out how other child processes, ie., not Apache's own worker 
processes, are killed off on a 
graceful restart or graceful shutdown. For its own worker processes it uses the 
POD but for other processes 
they may be somehow be getting forcibly killed off when pool which process 
structure is associated with is 
being cleared.

Original issue reported on code.google.com by Graham.Dumpleton@gmail.com on 8 Aug 2008 at 12:51

GoogleCodeExporter commented 9 years ago
Fix for leaking of listener socket on graceful restart committed at revision 
978 of trunk for 3.0. This is not back 
ported to 2.2 yet.

Note that change also ensures that UNIX listener socket is removed from file 
system on a graceful shutdown. 
That UNIX listener socket wasn't removed properly on graceful shutdown wasn't 
clearly described above.

Original comment by Graham.Dumpleton@gmail.com on 8 Aug 2008 at 11:30

GoogleCodeExporter commented 9 years ago
Note, change at revision 978 doesn't deal with daemon process shutdown trigger 
for graceful restart and 
graceful shutdown.

Original comment by Graham.Dumpleton@gmail.com on 8 Aug 2008 at 11:31

GoogleCodeExporter commented 9 years ago
Fixes back ported to version 2.2 in revision 982 and to version 1.5 in revision 
984 of respect branches.

Original comment by Graham.Dumpleton@gmail.com on 11 Aug 2008 at 11:42

GoogleCodeExporter commented 9 years ago
Version 2.2 released which fixed this, but then found after release that 
mistake in change was causing CGI 
scripts to fail when WSGIDaemonProcess directive used. Version 2.3 thus 
subsequent released.

Original comment by Graham.Dumpleton@gmail.com on 24 Aug 2008 at 5:59

GoogleCodeExporter commented 9 years ago

Original comment by Graham.Dumpleton@gmail.com on 24 Aug 2008 at 5:59

GoogleCodeExporter commented 9 years ago
Note, when determine if processes not shutdown properly on graceful restart, 
will create separate issue.

Original comment by Graham.Dumpleton@gmail.com on 24 Aug 2008 at 6:00

GoogleCodeExporter commented 9 years ago
This in all probability the same issue as has since been raised on Trac issue 
tracker.

  http://trac.edgewall.org/ticket/7611

The poster of that wants to disagree, but they haven't provided any evidence, 
like 'lsof' output examples above to 
show what files are actually being leaked.

Not known why they raised issue on Trac site when they suggest it is a mod_wsgi 
problem and not a Trac issue.

Original comment by Graham.Dumpleton@gmail.com on 10 Sep 2008 at 11:46