alexsilva / supervisor

Supervisor process control system for Windows
http://supervisord.org
Other
118 stars 26 forks source link

Supervisorctl slow and/or unresponsive #21

Closed philipstarkey closed 4 years ago

philipstarkey commented 4 years ago

I'm running the latest version of supervisor-win from pip on Python 3.8.5 (an Anaconda Python environment if that matters).

I'm experiencing extremely slow responses from supervisorctl, even with no [program]s defined (also with one defined - I just deleted it to be certain that was not the issue). Sometimes supervisorctl returns quickly, other times it will hang for several seconds, and sometimes it will just get stuck and I have to force kill the process. I've also noticed that windows sometimes fails to restart/stop the service (which may indicate that the problem is not with supervisorctl, but with supervisor itself).

My very minimal config file look like this:

[inet_http_server]         ; inet (TCP) server disabled by default
port=127.0.0.1:9001        ; ip_address:port specifier, *:port for all iface
;username=user              ; default is no username (open server)
;password=123               ; default is no password (open server)

[supervisord]
logfile=%(here)s\\supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB               ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10                  ; (num of main logfile rotation backups;default 10)
loglevel=debug                       ; (log level;default info; others: debug,warn,trace)
pidfile=%(here)s\\supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false                      ; (start in foreground if true;default false)
silent=false                 ; no logs to stdout if true; default false
minfds=1024                         ; (min. avail startup file descriptors;default 1024)
minprocs=200                        ; (min. avail process descriptors;default 200)

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
; this section empty

I'm assuming that the problem is something to do with delays in the xmlrpc communication. And I'm guessing there is a good chance it is not easily reproducible on other systems (since I would have expected someone else to have reported the issue by now - unless it's a problem with conda environments?). I'm happy to do the debugging - just not sure exactly where I should start with that as I'm unfamiliar with the internals of supervisor.

philipstarkey commented 4 years ago

I should add that there is nothing other than usual messages shown in the service log or supervisor log files. So no indication from there as to what the problem is. Windows firewall is also disabled (and no other firewall or antivirus is running except from windows defender). The PC is extremely capable (desktop core-i7, 32Gb RAM, nvme SSD) and there are no issues with any other software.

alexsilva commented 4 years ago

Test this option and see if the result improves.

delaysecs: Controls the execution speed of the main loop.

[supervisord]
delaysecs=0.2
philipstarkey commented 4 years ago

Thanks for your suggestion. I tried that today and it unfortunately did not make any difference.

So I did a bunch of debugging, and eventually by using sys.settrace and threading.settrace I was able to determine that the issue is on the supervisorctl side. It is locking up on this line in supervisorctl and ultimately sock.connect deep in the internals of Pythons socket library. There was no indication that the connection was being established even partially on the supervisor service side.

So I modified this line in supervisor where the HTTPConnection is instantiated to explicitly specify a timeout for the connection (in my test case timeout=5). My expectation was that this would raise a timeout exception on the times that I was seeing it lockup. However, instead it seems to have just made the issue go away entirely. Every call to supervisorctl now completes and returns the expected information. I confirmed that swapping it back to None caused the random lockups again (and tested this switching of the value between 5 and None several times with consistent results).

I wish I understood why - but I really hate sockets on Windows and don't really feel like debugging further at the moment! Given this is not a problem on Unix systems, I'm guessing that it wouldn't be accepted as a patch upstream. Is there any chance you would include an explicit socket timeout in your fork? I noticed you do set a default socket timeout in services.py, but this obviously doesn't propagate to command line use of supervisorctl.

alexsilva commented 4 years ago

I am investigating the problem and it seems that there is no timeout setting in the global value.

https://docs.python.org/3/library/socket.html#socket.getdefaulttimeout

Return the default timeout in seconds (float) for new socket objects. A value of None indicates that new socket objects have no timeout. When the socket module is first imported, the default is None.

Does this call return any value in your interpreter ? socket.getdefaulttimeout()

The value 5 is completely arbitrary and makes sense only on your machine. I've tried with python 2, 3.5, 3.7, 3.8 without changing the timeout and working without problems. But it might make sense to put this timeout setting in supervisor.conf.

alexsilva commented 4 years ago

As mentioned earlier, this problem seems to be very specific to your environment, so I put this new option xmlrpc_timeout that allows it to be adjusted according to the need but does not affect other environments.

Changes: https://github.com/alexsilva/supervisor/commit/a9c0f39f8ddfee5b43ae3236ececa4148d1ac2b3

[supervisorctl]
serverurl = http://127.0.0.1:9001
xmlrpc_timeout=5

I await the return of their tests.

philipstarkey commented 4 years ago

Thanks, that seems to do the trick.

It's worth noting that supervisorctl does not load the config file by default though - supervisor-win seems to have an additional check (over supervisor) where it only loads the default config file if the program name is "supervisord". This means I have to call supervisorctl -c very\long\path\to\supervisor.conf to get it to work. I'm not sure if there is an easy solution to that as I assume that the changes in #15 mean you could install multiple supervisor-win services each with a different .conf file? (and then which config would be the default for supervisorctl?)

alexsilva commented 4 years ago

In fact, although the name of a service can be changed, it is not possible to have two services for the same python installation with the supervisor.services script. The management of the scripts is for only one instance and I have no interest in changing this (however someone could do it if it is of interest).