SWI-Prolog / packages-clib

Assorted external libraries: processes, sockets, MIME, CGI, etc.
8 stars 19 forks source link

Use the winsock threads queue for deallocating sockets #3

Closed thetrime closed 7 years ago

thetrime commented 9 years ago

Recently I noticed that http_stop_server/2 does not do what you'd expect on Windows. It succeeds and closes the listening socket, but does not call closesocket() on it. The consequence is that you can start a new server, but Windows will give you two sockets both listening on the same port (?!). All the connection requests go to the defunct socket.

The original cause of this was code that I wrote (or helped write) many years ago to work around a problem where a server opening and closing a lot of connections rapidly could end up getting close messages for the wrong socket. Essentially what happened was:

1) Client and server are happily chatting away. Server has socket #1. 2) Server hangs up, and closes socket #1 via closesocket(). 3) Client also hangs up at about the same time. This triggers a FIN, which manifests as an FD_CLOSE ending up on the winsock window's message queue, but that's not the thread currently executing 4) Server asks for a new socket, and OS gives it the first available socket: Socket #1. Server tries to connect, or listen, or whatever, generating a waitRequest() and yields. 5) Winsock window wakes up and processes its events in order. First one is FD_CLOSE for socket #1. The logical action to take here is to close the socket, which it does. 6) Server wakes up and wonders why its socket is closed.

To try and fix that, we added a timeout mechanism; when you close a socket, just call shutdown() on it, then set a timeout. If we ask for a socket, and we get a socket marked as 'in the process of closing down', then don't give it to the process unless the timeout has passed.

This kind of works for sockets we want to use for connecting, but works very poorly for sockets which are bound, since when we say we want to close the socket, we really mean it. I also don't think you get an FD_CLOSE for such a socket - one of us (probably Jan) alluded to this in a comment added to the source.

This pull request would get rid of that, and instead deallocate sockets (call closesocket()) only on the winsock thread. When we elect to close a connection, the process would be: 1) call shutdown() on the socket. 2) post a message to the winsock queue asking it to call closesocket()

I assume that any FD_CLOSE messages for the socket will be delivered before shutdown() and discarded after it. The problem we had before was that FD_CLOSE could be received on one thread, we do the shutdown() and close() on another after that, but before the FD_CLOSE message was actually processed. The socket would be reallocated (after all, it was closed), and then many time slices later, the winsock thread would reawaken, see the FD_CLOSE on its queue, and close the connection prematurely. I'm hoping that RST/FINs received by the OS after shutdown() will be discarded and not turned into FD_CLOSEs, but it's very hard to get concrete information.

Sorry for the excessively long message, but I wanted to try and be as clear as possible, since one day I'm sure we will be referring back to this :)