Extremely Large Get Requests Fail

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Issue a get request that is extremely large and is delivered in multiple tcp 
packets
2.
3.

What is the expected output? What do you see instead?
The expected output is that of a normal (multi) get request. Instead the server 
closes the connection when epoll gives a 0 result.

What version of the product are you using? On what operating system?
1.4.10 / Ubuntu Lucid 2.6 kernel

Please provide any additional information below.
You can see from this strace output 
(https://gist.github.com/3950bdee3a984a1af2dd) that the server reads 
increasingly large chunks of the request and then closes the connection when 
epoll gives a 0 result. From what we can see from the source code it looks like 
it does this 4 times in increasingly large chunks and then just stops reading. 
(https://github.com/memcached/memcached/commit/75cc83685e103bc8ba380a57468c8f044
13033f9#L0R3233)

Original issue reported on code.google.com by tay...@37signals.com on 29 Dec 2011 at 10:21

GoogleCodeExporter commented 9 years ago

This is an ascii connection (looks like so from the strace)? Exactly how large 
of a multiget is this?

That change was only supposed to nuke the connection if you were flooding the 
conn with junk data. Multigets would/should've started parsing at some point.

If not, I guess that's a bug.

Original comment by dorma...@rydia.net on 29 Dec 2011 at 10:29

GoogleCodeExporter commented 9 years ago

Yes ascii. Not using binary protocol. Using a tcp socket.

Here's a sample get that triggers this: 
https://gist.github.com/f764d48251ceda134f28

Original comment by tay...@37signals.com on 29 Dec 2011 at 10:32

GoogleCodeExporter commented 9 years ago

If I'm understanding things correctly, one thing that might "help" is 
increasing the initial buffer size. Also, since the request is terminated with 
\r\n, it seems like it should read until \r\n, or until some other user 
configurable threshold. (To prevent someone from sending an endless stream of 
junk data.)

Original comment by tay...@37signals.com on 30 Dec 2011 at 5:15

GoogleCodeExporter commented 9 years ago

I was under the impression it only booted you if the connection wasn't trying 
to do a multiget.. so I'll have to go test it or wait for trond to see this and 
respond himself.

I'll need a few days before being able to test it though. Thanks for your report

Original comment by dorma...@rydia.net on 30 Dec 2011 at 5:24

GoogleCodeExporter commented 9 years ago

We've only seen it when doing a multiget. We are happy to help test in any way 
we can. Unfortunately this is causing lots of errors for us in production. (In 
the interim are attempting to work around them now that we've identified this 
as the cause.)

Thanks again!

Original comment by tay...@37signals.com on 30 Dec 2011 at 5:29

GoogleCodeExporter commented 9 years ago

there aren't any other massive commands which happen sans a \r\n :)

that must be a pretty huge multiget though. usually folks have a few servers 
and the command is split up.

It shouldn't be *too* long here... I've slacked a week on 1.4.11 because the 
holidays call to me, but that fixes some other bugs and I need to wrap it up 
first.

Original comment by dorma...@rydia.net on 30 Dec 2011 at 5:50

GoogleCodeExporter commented 9 years ago

I can't reproduce this, even with a 500,000 key multiget or your provided one. 
It'll work (but a little slow in the 500k case). It's not disconnecting me.

Can you provide a script that reproduces your error? Along with details of 
exact client versions of all included utilities. Ideally if you start a fresh 
memcached instance, the script will fill and fetch all the necessary data 
before causing the disconnection.

Thanks!

Original comment by dorma...@rydia.net on 11 Jan 2012 at 12:34

GoogleCodeExporter commented 9 years ago

ping? Anyone reading this ticket?

I wasn't able to reproduce the server early close from your test input or from 
mc-crusher's 500,000 key multiget. Do you have more information?

Original comment by dorma...@rydia.net on 25 Jan 2012 at 8:17

GoogleCodeExporter commented 9 years ago

I just took a quick stab at reproducing it and I couldn't either... which 
leaves me pretty confused. The traces before were easily repeatable ... we did 
it a few times before submitting. When I have a few minutes I'll give it 
another go. Between now and then I'm thinking it might be a bug in Ruby / 
Rails... although there's nothing obvious.

Original comment by tay...@37signals.com on 25 Jan 2012 at 8:31

GoogleCodeExporter commented 9 years ago

Ok, I'll leave the bug open for a few more days just in case, but please let us 
know!

Original comment by dorma...@rydia.net on 25 Jan 2012 at 8:32

GoogleCodeExporter commented 9 years ago

Any update? :)

Original comment by dorma...@rydia.net on 1 Feb 2012 at 7:09

GoogleCodeExporter commented 9 years ago

Could never reproduce this. mc-crusher would run with 500,000 key multigets 
just fine (though that did point out an issue in the ascii parser...)

Original comment by dorma...@rydia.net on 14 Jul 2012 at 11:43

Changed state: Invalid

Jdesk / memcached

Extremely Large Get Requests Fail #244