Supervisor / supervisor

Supervisor process control system for Unix (supervisord)
http://supervisord.org
Other
8.33k stars 1.23k forks source link

error: <class 'xmlrpc.client.ProtocolError'>, <ProtocolError for 127.0.0.1/RPC2: 500 Internal Server Erro #1600

Closed SudoGetBeer closed 10 months ago

SudoGetBeer commented 10 months ago

Hello,

When deploying our application, we are using the following command to restart the daemons: supervisorctl restart all

Recently we got a error sometimes: error: <class 'xmlrpc.client.ProtocolError'>, <ProtocolError for 127.0.0.1/RPC2: 500 Internal Server Error>: file: /usr/lib/python3/dist-packages/supervisor/xmlrpc.py line: 542

And in the supervisor.log file we see this error:

2023-08-31 11:46:38,593 ERRO XML-RPC response callback error:Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/supervisor/xmlrpc.py", line 78, in more
    value = self.callback()
  File "/usr/lib/python3/dist-packages/supervisor/rpcinterface.py", line 947, in allfunc
    callback = func(name, **extra_kwargs)
  File "/usr/lib/python3/dist-packages/supervisor/rpcinterface.py", line 301, in startProcess
    process.spawn()
  File "/usr/lib/python3/dist-packages/supervisor/process.py", line 213, in spawn
    self._assertInState(ProcessStates.EXITED, ProcessStates.FATAL,
  File "/usr/lib/python3/dist-packages/supervisor/process.py", line 185, in _assertInState
    raise AssertionError('Assertion failed for %s: %s not in %s' %  (
AssertionError: Assertion failed for daemon-755623_00: UNKNOWN not in EXITED FATAL BACKOFF STOPPED

Supervisor version is 4.2.1.

Our current "bugfix" for this is deleting the daemon and restarting it again by hand. Maybe someone else can help us with this.

mnaberez commented 10 months ago

AssertionError: Assertion failed for daemon-755623_00: UNKNOWN not in EXITED FATAL BACKOFF STOPPED

It looks like a process running under supervisord was in the UNKNOWN state at the time this command was sent. Running supervisorctl status should have shown UNKNOWN. A process should never enter the UNKNOWN state.

As far as I can tell, there are only two places in Supervisor 4.2.1 where a process can enter the UNKNOWN state. Both occur when sending a signal to a process (either with supervisorctl stop or supervisorctl signal).

https://github.com/Supervisor/supervisor/blob/b7ed14e100904a78c2912d39da46ef956bfbb56f/supervisor/process.py#L465-L472

https://github.com/Supervisor/supervisor/blob/b7ed14e100904a78c2912d39da46ef956bfbb56f/supervisor/process.py#L504-L511

You should see the messages above in the log file.

If this is what is happening, it could be configuration error. supervisord started a process via fork/exec but now the OS won't allow supervisord to send a signal to that process. One way this could happen is if supervisord is running as non-root and there is something like command=sudo /bin/cat in the config file.

SudoGetBeer commented 10 months ago

Hey @mnaberez, thanks for your reply. Now I found this error in the log:

2023-08-31 11:38:23,545 CRIT unknown problem killing daemon-755623_00 (2992376):Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/supervisor/process.py", line 466, in kill
    options.kill(pid, sig)
  File "/usr/lib/python3/dist-packages/supervisor/options.py", line 1302, in kill
    os.kill(pid, signal)
ProcessLookupError: [Errno 3] No such process

To be honest... Don't really know why this is happening.

The config is:

[program:daemon-755623]
directory=/home/xxxxx/xxxxxxxxxxxxxxx/
command=php8.2 artisan horizon

process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
user=forge
numprocs=1
startsecs=1
redirect_stderr=true
stdout_logfile=/home/xxxx/.xxxx/daemon-755623.log
stopwaitsecs=10
stopsignal=SIGTERM
stopasgroup=true
killasgroup=true
mnaberez commented 10 months ago

Now I found this error in the log:

2023-08-31 11:38:23,545 CRIT unknown problem killing daemon-755623_00 (2992376):Traceback (most recent call last): File "/usr/lib/python3/dist-packages/supervisor/process.py", line 466, in kill options.kill(pid, sig) File "/usr/lib/python3/dist-packages/supervisor/options.py", line 1302, in kill os.kill(pid, signal) ProcessLookupError: [Errno 3] No such process

The process exited immediately beforesupervisord tried to kill it and supervisord did not handle the error from the OS correctly. Your Supervisor version is 4.2.1. This is a bug that was fixed in 4.2.2 (changelog, diff). Please upgrade to a newer version.