binary protocol parsing can cause memcached server lockup

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. run a memcached server on localhost port 10000
2. compile and run the attached c code
3. change the len from 37 to 38 on line 69, recompile and run the code again.

What is the expected output? What do you see instead?

use top and watch the cpu stats for step 2 and 3. Even after stopping the
programs from step 3 the memcached server is using 100% cpu. For step 2 and
3 I expect for the load to drop to 0 after stopping the program.

What version of the product are you using? On what operating system?
memcached v1.4.3 on linux 2.6 compiled libevent using epoll.

Please provide any additional information below.

The attached code currently be only compiled under linux 2.6 with epoll
support.

Please note that by changing the length of 37 to 38 bytes I broke the
memcached binary spec intentionally. My point is that memcached should have
better protocol handling to protect itself against faulty packets.

Original issue reported on code.google.com by marlon%h...@gtempaccount.com on 12 Nov 2009 at 4:05

Attachments:

test.c

GoogleCodeExporter commented 9 years ago

The root cause of this issue is try_read_udp() never reset c->rbytes.

The processing of conn_nread minus c->rbytes by c->rlbytes, so if extra bytes 
exists, they will be 'left' in c->rbytes. However, each time when recvfrom()s 
UDP socket, the whole read buffer is overwritten, so 'c->rbytes += res' sets 
c->rbytes incorrectly.

After the binary 'GET' command fail, the state of connection is transferred to 
'conn_new_cmd' in write_bin_error(), then 'conn_parse_cmd' in 
reset_cmd_handler(). When c->rbytes less than sizeof(c->binary_header) (24 
bytes), the server believes that there's no enough bytes to hold binary header 
so the processing continues normally. However, in the processing of attached 
test program, c->rbytes increases by 1 each time a request is completed. 
Therefore, after 24 requests, server (wrongly) thinks there's an extra binary 
header, and it finally gets wrong magic in try_read_command().

The dead lock is caused by the error processing code. If the UDP connection is 
in conn_closing state, it never get closed but only calls conn_cleanup(). 
Therefore, when there are incoming packets in the udp socket, they will never 
be consumed, so epoll_wait() (in libevent) triggers input events over and over 
again for a closing connection.

To set c->rbytes correctly:
--- ./memcached.ori.c   2010-08-23 22:59:12.000000000 +0800
+++ ./memcached.c   2010-08-23 22:59:17.000000000 +0800
@@ -3223,7 +3223,7 @@
         res -= 8;
         memmove(c->rbuf, c->rbuf + 8, res);

-        c->rbytes += res;
+        c->rbytes = res;
    c->rcurr = c->rbuf;
         return READ_DATA_RECEIVED;
     }

To consume incoming packets when cleanup:

--- ./memcached.ori.c   2010-08-23 22:59:12.000000000 +0800
+++ ./memcached.c   2010-08-23 23:27:38.000000000 +0800
@@ -471,6 +471,9 @@
         sasl_dispose(&c->sasl_conn);
         c->sasl_conn = NULL;
     }
+
+    /* consume incoming packets */
+    recvfrom(c->sfd, NULL, 0, 0, NULL, NULL);
 }

 /*

Original comment by pi3or...@gmail.com on 23 Aug 2010 at 3:35

GoogleCodeExporter commented 9 years ago

The issue is studied and solved under the help of Snitchaser: 
http://gitorious.org/snitchaser

Original comment by pi3or...@gmail.com on 23 Aug 2010 at 3:36

GoogleCodeExporter commented 9 years ago

this looks similar to the problem with issue 158, so i'll take a look at this

Original comment by eric.d.l...@gmail.com on 11 Nov 2010 at 7:31

GoogleCodeExporter commented 9 years ago

may be this specific issue is solved by the provided patch (haven't tried), 
but in general the problem remains. 
Even after applying the path suggested the following packet sent through udp 
hangs  memcached udp:
"\x00\x00\x00\x00\x00\x01\x00\x00\x80\x05\x00\x01\x08\x00\x00\x00\x00\x00\x00\x0
a\xef\xbe\xad\xde\x00\x00\x00\x00\x00\x00\x00\x00"

(in fact any subsequent packet causes 100% usage)

Is is a storage_command (in terms of testapp.c), but the cmd (\x05, increment) 
is of type 'arithmetic_command' (in terms of testapp.c).
About this packet - all fields are 'correct' except for the type of cmd.
In the attachment - script that reproduces the livelock.

Original comment by airat.ha...@gmail.com on 24 Jan 2011 at 5:16

Attachments:

udp_livelock.py

GoogleCodeExporter commented 9 years ago

Hello again! I (the author of http://gitorious.org/snitchaser) has disappeared 
for nearly 10 month, and now come back to continue working on this bug. 
Snitchaser now has been split into 2 projects: Snitchaser 
(http://code.google.com/p/snitchaser/) for single threading programs and 
ReBranch (http://code.google.com/p/rebranch/) for multi-threading programs. I 
use ReBranch to analyze this bug, and solve it in 2 hours. In fact the root 
cause is simple, I should have discover it 5 month ago.

The root cause is that, when memcached set a udp connection to conn_close 
state, the connection will never come back again. Different from tcp 
connections, a memcached server can have only 1 udp connection to serve all udp 
requests. Hence, for an error connection, it only 'cleanup' the connection, not 
free it. However, memcached never revert the state of a cleanup-ed connection.

Here we suggest a patch for memcached 1.3.5:

$ diff -u ./memcached.ori.c ./memcached.c 
--- ./memcached.ori.c   2011-06-27 22:16:22.401000079 +0800
+++ ./memcached.c   2011-06-27 22:17:26.102000078 +0800
@@ -471,6 +471,11 @@
         sasl_dispose(&c->sasl_conn);
         c->sasl_conn = NULL;
     }
+
+    if (IS_UDP(c->transport)) {
+        recvfrom(c->sfd, NULL, 0, 0, NULL, NULL);
+        conn_set_state(c, conn_new_cmd);
+    }
 }

 /*
@@ -3223,7 +3228,7 @@
         res -= 8;
         memmove(c->rbuf, c->rbuf + 8, res);

-        c->rbytes += res;
+        c->rbytes = res;
         c->rcurr = c->rbuf;
         return READ_DATA_RECEIVED;
     }

For udp connection, when closing, it consumes packets in socket then reset the 
state to conn_new_cmd.

Original comment by pi3or...@gmail.com on 27 Jun 2011 at 2:20

GoogleCodeExporter commented 9 years ago

Can you issue a patch against 1.6.0-beta1? Or were you talking about 1.4.5, not 
1.3.5?

Original comment by dorma...@rydia.net on 27 Jun 2011 at 7:07

GoogleCodeExporter commented 9 years ago

Ok. 1.6.0-beta1 still suffer from this problem, the principle is similar. 
Following patch solve it (I thought).

--- ./daemon/memcached.ori.c    2011-06-28 09:43:45.120000072 +0800
+++ ./daemon/memcached.c    2011-06-28 09:57:18.390999971 +0800
@@ -689,13 +689,19 @@
         c->sasl_conn = NULL;
     }

-    c->engine_storage = NULL;
-    c->tap_iterator = NULL;
-    c->thread = NULL;
-    assert(c->next == NULL);
-    c->ascii_cmd = NULL;
-    c->sfd = INVALID_SOCKET;
-    c->tap_nack_mode = false;
+    if (IS_UDP(c->transport)) {
+        recvfrom(c->sfd, NULL, 0, 0, NULL, NULL);
+        conn_set_state(c, conn_new_cmd);
+    } else {
+        c->sfd = INVALID_SOCKET;
+        c->engine_storage = NULL;
+        c->tap_iterator = NULL;
+        c->thread = NULL;
+        assert(c->next == NULL);
+        c->ascii_cmd = NULL;
+        c->tap_nack_mode = false;
+    }
+
 }

 void conn_close(conn *c) {
@@ -4826,7 +4832,7 @@
         res -= 8;
         memmove(c->rbuf, c->rbuf + 8, res);

-        c->rbytes += res;
+        c->rbytes = res;
         c->rcurr = c->rbuf;
         return READ_DATA_RECEIVED;
     }

Original comment by pi3or...@gmail.com on 28 Jun 2011 at 2:00

GoogleCodeExporter commented 9 years ago

I just pulled a patch similar to this into my 1.4.7 tree.

This was confusing me a bit... the extra recvfrom seemed pointless and it'd 
hang while running. The test from #158 would spin out of control without the 
recvfrom.

But the initial state of a UDP connection is actually conn_read, not 
conn_new_cmd. Using the correct state makes it find the next UDP packet and 
throw it away properly.

Still iffy on the rbytes bit, but so long as multiple command parsing works 
(which I don't think it does now anyway), that should be correct.

Punting to trond in case he wants to port this properly to 1.6.0, or take a 
look at least. I'll be pushing my for_147 branch to github at some point.

Original comment by dorma...@rydia.net on 8 Aug 2011 at 5:44

Changed state: Started

GoogleCodeExporter commented 9 years ago

The recvfom() use 0 as SIZE and NULL as BUFFER. According to manual, that call 
will consume the incoming packet. Without it, epoll_wait() will always return 
immediately because the UDP socket is still in readable state.

 -- Function: int recvfrom (int SOCKET, void *BUFFER, size_t SIZE, int
          FLAGS, struct sockaddr *ADDR, socklen_t *LENGTH-PTR)
     The `recvfrom' function reads one packet from the socket SOCKET
     into the buffer BUFFER.  The SIZE argument specifies the maximum
     number of bytes to be read.

     If the packet is longer than SIZE bytes, then you get the first
     SIZE bytes of the packet and the rest of the packet is lost.
     There's no way to read the rest of the packet.  Thus, when you use
     a packet protocol, you must always know how long a packet to
     expect.

Original comment by pi3or...@gmail.com on 8 Aug 2011 at 6:06

GoogleCodeExporter commented 9 years ago

The incoming packet is consumed in try_read_udp(), so if you get into a 
conn_close state after reading one packet, calling an empty recvfrom will 
consume the *next*, unseen packet.

Unless there is some case you're talking about which causes the next packet on 
the wire to be an error?

I did push my tree though, so take a look at what I ended up with:
https://github.com/dormando/memcached/commit/954f6dddc41c80d2e625bce63744df5556a
425bb

... I can't break it with your tests anymore, and it doesn't hang before 
responding. If you look at how TCP connections are intialized, you'll see 
they're set to conn_new_cmd, but the UDP listeners are initialized to 
conn_read. The latter makes sense in the flow.

Either way, can you try the patch and see if you can still break it? :)

Original comment by dorma...@rydia.net on 8 Aug 2011 at 6:44

GoogleCodeExporter commented 9 years ago

I think you are right.

In my first patch (Comment 1), the recvfrom() call is critical because in that 
situation, if a UDP socket is in conn_closing state, it will never be 
recovered, all incoming packets should be dropped. If we recover its state to 
conn_read, there's no need for the recvfrom() again.

Original comment by pi3or...@gmail.com on 8 Aug 2011 at 7:22

GoogleCodeExporter commented 9 years ago

think this was merged up. closing.

Original comment by dorma...@rydia.net on 28 Sep 2011 at 4:45

Changed state: Fixed

Jdesk / memcached

binary protocol parsing can cause memcached server lockup #106