Closed philipstarkey closed 4 years ago
I should add that there is nothing other than usual messages shown in the service log or supervisor log files. So no indication from there as to what the problem is. Windows firewall is also disabled (and no other firewall or antivirus is running except from windows defender). The PC is extremely capable (desktop core-i7, 32Gb RAM, nvme SSD) and there are no issues with any other software.
Test this option and see if the result improves.
delaysecs: Controls the execution speed of the main loop.
[supervisord]
delaysecs=0.2
Thanks for your suggestion. I tried that today and it unfortunately did not make any difference.
So I did a bunch of debugging, and eventually by using sys.settrace
and threading.settrace
I was able to determine that the issue is on the supervisorctl side. It is locking up on this line in supervisorctl and ultimately sock.connect
deep in the internals of Pythons socket library. There was no indication that the connection was being established even partially on the supervisor service side.
So I modified this line in supervisor where the HTTPConnection
is instantiated to explicitly specify a timeout for the connection (in my test case timeout=5
). My expectation was that this would raise a timeout exception on the times that I was seeing it lockup. However, instead it seems to have just made the issue go away entirely. Every call to supervisorctl now completes and returns the expected information. I confirmed that swapping it back to None
caused the random lockups again (and tested this switching of the value between 5
and None
several times with consistent results).
I wish I understood why - but I really hate sockets on Windows and don't really feel like debugging further at the moment! Given this is not a problem on Unix systems, I'm guessing that it wouldn't be accepted as a patch upstream. Is there any chance you would include an explicit socket timeout in your fork? I noticed you do set a default socket timeout in services.py, but this obviously doesn't propagate to command line use of supervisorctl.
I am investigating the problem and it seems that there is no timeout setting in the global value.
https://docs.python.org/3/library/socket.html#socket.getdefaulttimeout
Return the default timeout in seconds (float) for new socket objects. A value of None indicates that new socket objects have no timeout. When the socket module is first imported, the default is None.
Does this call return any value in your interpreter ? socket.getdefaulttimeout()
The value 5 is completely arbitrary and makes sense only on your machine. I've tried with python 2, 3.5, 3.7, 3.8 without changing the timeout and working without problems. But it might make sense to put this timeout setting in supervisor.conf.
As mentioned earlier, this problem seems to be very specific to your environment, so I put this new option xmlrpc_timeout
that allows it to be adjusted according to the need but does not affect other environments.
Changes: https://github.com/alexsilva/supervisor/commit/a9c0f39f8ddfee5b43ae3236ececa4148d1ac2b3
[supervisorctl]
serverurl = http://127.0.0.1:9001
xmlrpc_timeout=5
I await the return of their tests.
Thanks, that seems to do the trick.
It's worth noting that supervisorctl does not load the config file by default though - supervisor-win seems to have an additional check (over supervisor) where it only loads the default config file if the program name is "supervisord". This means I have to call supervisorctl -c very\long\path\to\supervisor.conf
to get it to work. I'm not sure if there is an easy solution to that as I assume that the changes in #15 mean you could install multiple supervisor-win services each with a different .conf file? (and then which config would be the default for supervisorctl?)
In fact, although the name of a service can be changed, it is not possible to have two services for the same python installation with the supervisor.services script. The management of the scripts is for only one instance and I have no interest in changing this (however someone could do it if it is of interest).
I'm running the latest version of supervisor-win from pip on Python 3.8.5 (an Anaconda Python environment if that matters).
I'm experiencing extremely slow responses from supervisorctl, even with no
[program]
s defined (also with one defined - I just deleted it to be certain that was not the issue). Sometimes supervisorctl returns quickly, other times it will hang for several seconds, and sometimes it will just get stuck and I have to force kill the process. I've also noticed that windows sometimes fails to restart/stop the service (which may indicate that the problem is not with supervisorctl, but with supervisor itself).My very minimal config file look like this:
I'm assuming that the problem is something to do with delays in the xmlrpc communication. And I'm guessing there is a good chance it is not easily reproducible on other systems (since I would have expected someone else to have reported the issue by now - unless it's a problem with conda environments?). I'm happy to do the debugging - just not sure exactly where I should start with that as I'm unfamiliar with the internals of supervisor.