reqs_per_event handling (-R) is incorrect leading to client lockups

GoogleCodeExporter commented 9 years ago

From code inspection, I think there's a bug in drive_machine().  Sorry
I don't have a test for this, so this is just from eyeballing the
code.

With the new -R command-line flag, you can configure the
settings.reqs_per_event value, which defaults to 20.
(By the way, if you set -R to 1, memcached locks the 
client connection on its first message.)

First off, I think the -R flag should limit the number of commands
processed per connection, where a command is a GET, SET, DELETE, etc.
As it's currently implemented, though, the code seems instead to try to
limit the number of consecutive try_read_network()'s per connection.

And, even at that, the code might not work right.  When drive_machine()
checks its counter (nreqs), I think it's missing a "stop = true;"
on an else clause.

That is, when nreqs goes to 0, the main drive_machine() while-loop just
skips one loop and then just keeps on going, so nreqs goes
negative...

         case READ_DATA_RECEIVED:
         /* Only process nreqs at a time to avoid starving other
             connections */
             if (--nreqs)
                 conn_set_state(c, conn_parse_cmd);
             break;

I think the right code should instead test nreqs during
conn_parse_cmd and also set 'stop' to true.  Here's a patch...

diff --git a/memcached.c b/memcached.c
index 7f8af34..4dd8db9 100644
--- a/memcached.c
+++ b/memcached.c
@@ -3168,10 +3168,7 @@ static void drive_machine(conn *c) {
                 conn_set_state(c, conn_waiting);
                 break;
             case READ_DATA_RECEIVED:
-             /* Only process nreqs at a time to avoid starving other
-                connections */
-                if (--nreqs)
-                    conn_set_state(c, conn_parse_cmd);
+                conn_set_state(c, conn_parse_cmd);
                 break;
             case READ_ERROR:
                 conn_set_state(c, conn_closing);
@@ -3182,12 +3179,17 @@ static void drive_machine(conn *c) {
             }
             break;

-        case conn_parse_cmd :
-            if (try_read_command(c) == 0) {
-                /* wee need more data! */
-                conn_set_state(c, conn_waiting);
-            }
-
+        case conn_parse_cmd:
+            /* Only process nreqs at a time to avoid starving other
+               connections */
+            nreqs--;
+            if (nreqs >= 0) {
+                if (try_read_command(c) == 0) {
+                    /* we need more data! */
+                    conn_set_state(c, conn_waiting);
+                }
+            } else
+                stop = true;
             break;

         case conn_new_cmd:

Original issue reported on code.google.com by steve....@gmail.com on 22 Jun 2009 at 11:29

GoogleCodeExporter commented 9 years ago

Original comment by dsalli...@gmail.com on 23 Jun 2009 at 1:39

Changed title: reqs_per_event handling (-R) is incorrect leading to client lockups

GoogleCodeExporter commented 9 years ago

I believe that another problem might be that we need to ensure that we have 
updated the libevent event flag for 
the socket before we can return (otherwise libevent will not trigger the 
callback for the filedescriptor again). I'll 
do a code review to verify this..

Original comment by trond.no...@gmail.com on 26 Jun 2009 at 1:17

GoogleCodeExporter commented 9 years ago

(and it may block if we have the next command in our input read buffer)

Original comment by trond.no...@gmail.com on 26 Jun 2009 at 1:23

GoogleCodeExporter commented 9 years ago

I believe that a better place to put this logic is in the conn_new_cmd state. 
We will
always iterate through that state between each command.

All we need now is a good test case...

diff --git a/memcached.c b/memcached.c
index 6ad3231..440eb49 100644
--- a/memcached.c
+++ b/memcached.c
@@ -3199,10 +3199,7 @@ static void drive_machine(conn *c) {
                 conn_set_state(c, conn_waiting);
                 break;
             case READ_DATA_RECEIVED:
-             /* Only process nreqs at a time to avoid starving other
-                connections */
-                if (--nreqs)
-                    conn_set_state(c, conn_parse_cmd);
+                conn_set_state(c, conn_parse_cmd);
                 break;
             case READ_ERROR:
                 conn_set_state(c, conn_closing);
@@ -3222,6 +3219,12 @@ static void drive_machine(conn *c) {
             break;

         case conn_new_cmd:
+            /* Only process nreqs at a time to avoid starving other
+               connections */
+            --nreqs;
+            if (nreqs == 0) {
+                stop = true;
+            }
             reset_cmd_handler(c);
             break;

Original comment by trond.no...@gmail.com on 30 Jun 2009 at 6:44

GoogleCodeExporter commented 9 years ago

Pushed a better fix and a test case:
http://github.com/trondn/memcached/commit/3ea9838fdbfac07cc50edafc4f85dc49a57b51
68

Original comment by trond.no...@gmail.com on 30 Jun 2009 at 8:06

Changed state: Started

GoogleCodeExporter commented 9 years ago

Original comment by trond.no...@gmail.com on 30 Jun 2009 at 8:07

Added labels: OpSys-All, Usability

GoogleCodeExporter commented 9 years ago

I pushed up a doc change for this in dustin/issue_61

Original comment by dsalli...@gmail.com on 2 Jul 2009 at 3:12

GoogleCodeExporter commented 9 years ago

Pushed in 1.4

Original comment by dsalli...@gmail.com on 10 Jul 2009 at 4:23

Changed state: Fixed

Jdesk / memcached

reqs_per_event handling (-R) is incorrect leading to client lockups #61