No keep-alive timeout causes issues with multiple concurrent clients using modern browsers

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Enable keep-alive & set threads to a low number (1 has the most obvious 
issues)
2. Visit the site with a browser which has a long keep-aplive timeout (e.g. 
google chrome)
3. Try to visit the site with a second browser instance, the browsers will spin 
waiting for data to arrive. This is because the single thread is locked to the 
first browser instance via keep alive until the first browser instance decides 
to close the connection. The same issues arise with more threads and more 
browser instances.

What is the expected output? What do you see instead?
The http server closes the keep-alive connection after a short period. 
Typically in the order of seconds.

What version of the product are you using? On what operating system?
V3 on Win7 X64

Please provide any additional information below.

There is a simple workaround but not sure if this code is portable to other 
operating systems:

In the "pull" function after the line
   nread = recv(sock, buf, (size_t) len, 0);

Add the following:

    if ( nread == SOCKET_ERROR )
    {
        int err = WSAGetLastError();
        // A read timeout means the keep-alive timed out...
        if (err == WSAECONNABORTED)
        {
            // Behave as if the remote side closed to the rest of the code...
            return 0;
        }
    }

Then in the "process_new_connection" function after the line
  keep_alive_enabled = !strcmp(conn->ctx->config[ENABLE_KEEP_ALIVE], "yes");

Add the following:

  if ( keep_alive_enabled )
  {
      // Enable read timeouts on the socket
      DWORD tv = 15 * 1000; //15 sec, probably should go in the config file...
      if (setsockopt(conn->client.sock, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv,  sizeof(tv)))
      {
          cry(conn, "setsockopt failed");
      } 
  }

Original issue reported on code.google.com by CHolleme...@gmail.com on 11 Aug 2011 at 2:43

Merged into: #268

GoogleCodeExporter commented 9 years ago

I also have been seeing this issue but in my case it manifests it self 
differently. I set "enable_keep_alive" to yes and num_threads to 2 and hit the 
server from a single client browser.
The first socket that is created eventually causes that worker thread to wait 
indefinitely on a recv() call. Since its blocked waiting for recv() to return, 
it never gets the signal to wake up and eventually exit when I call mg_stop(). 
So that thread is hung indefinitely waiting in pull().
I tried the above changes on windows and it fixes the problem. Per the apple 
docs 
(http://developer.apple.com/library/mac/#documentation/darwin/reference/manpages
/man2/recv.2.html) I think the modified change to pull() from the post above 
should work on win and mac but am not sure about other platforms.

#if defined(_WIN32)
if ( nread == SOCKET_ERROR && WSAGetLastError() == WSAECONNABORTED)
#else   // Mac/Unix/Linux??
if(nread == -1)
#endif
{
  nread = 0; // Behave as if the remote side closed to the rest of the code...
}

Original comment by jcmay...@gmail.com on 12 Aug 2011 at 1:40

GoogleCodeExporter commented 9 years ago

This issue has been resolved as part of the work on issue349; 'final solution' 
was to detect 'inactive connections' i.e. connections which don't have request 
(read) data pending and push those back onto the queue. The connections waiting 
on the queue are scanned to find the first one in there which has read data 
available and the thread which initiated the scan will then pop that one and 
serve the request (or series of requests as long as data keeps on pouring in 
quickly enough).

See http://code.google.com/p/mongoose/issues/detail?id=349#c50 (comments 
~47..50 show the progression towards the mentioned solution)

N.B.: this solves the matter when you're stuck in the blocking recv() waiting 
for request data; a malicious sender could still occupy the thread until 
SO_RCVTIMEO fires (set up via the set_timeout() call in that clone; the github 
repo also has code to adjust that value via config) when that malicious sender 
only transmits a /partial/ (i.e. incomplete, no CRLF+CRLF header termination) 
request: get_request() will then still block in recv() after looping. 
Additional select()+check-stop logic would be needed inside pull() to thwart 
malicious senders like that. 

And then there's the variant of malicious user over SSL, i.e. via blocking 
SSL_read() ...

Original comment by ger.hobbelt on 21 Jun 2012 at 8:21

GoogleCodeExporter commented 9 years ago

Ger, 
I was working on issue 268 and implemented the additional select inside the 
pull function you requested :-) 
See http://code.google.com/p/mongoose/issues/detail?id=268#c5.

Original comment by borkh...@gmail.com on 21 Jun 2012 at 10:06

GoogleCodeExporter commented 9 years ago

Also see http://code.google.com/r/borkhuis-issue268/source/browse for my 
version with the fix for issue 268.

Original comment by borkh...@gmail.com on 21 Jun 2012 at 1:42

GoogleCodeExporter commented 9 years ago

Issue 260 has been merged into this issue.

Original comment by valenok on 23 Sep 2012 at 12:56

GoogleCodeExporter commented 9 years ago

Original comment by valenok on 23 Sep 2012 at 12:58

Changed state: Duplicate

ancruna / mongoose

No keep-alive timeout causes issues with multiple concurrent clients using modern browsers #273