Supervisor / supervisor

Supervisor process control system for Unix (supervisord)
http://supervisord.org
Other
8.53k stars 1.25k forks source link

EADDRINUSE during a reload #1596

Closed julien6387 closed 1 year ago

julien6387 commented 1 year ago

I'm using Supervisor 4.2.4 and I have a use case where I can repeat the following failure almost 100% of the time when restarting Supervisor:

Error: Another program is already listening on a port that one of our HTTP servers is configured to use.  Shut this program down first before starting supervisord.
For help, use /usr/local/bin/supervisord -h
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)

I suspect it happens when the HTTP socket is still under use when the close() is called on the socket. As long as there is a handle on the socket, the operating system will not actually deallocate the "closed" socket.

Indeed, I am using a lot of REMOTE_COMMUNICATION_EVENT that make the HTTP socket a bit busy. When the Supervisor restart is called, the socket is most of the time not deallocated, although it has been closed. The restart being swift, the socket bind cannot reuse the port (still in TIME_WAIT) and fails.

The socket has to be shutdown before it's closed, so that the FIN event is sent to the peers. Then the socket close triggers the deallocation quite immediately. Once applied, I never had the issue again and it's good practice anyway.

julien6387 commented 1 year ago

There is a comment in ServerOptions.close_httpservers stating that:

            # For unknown reasons, sometimes an http_channel
            # dispatcher in the socket map related to servers
            # remains open *during a reload*.

It might be related somehow.