akwei / memcached

Automatically exported from code.google.com/p/memcached
0 stars 0 forks source link

intermittant failure on incr of numbers larger than 2^63-1, ubuntu 10.04 #297

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
The problem is usually intermittent but occasionally we can repeat it on an 
given host as follows.

set asdf 0 200000 19
9223372036854775807
STORED
incr asdf 1
9223372036854775808
incr asdf 1
CLIENT_ERROR cannot increment or decrement non-numeric value
version
VERSION 1.4.13

What is the expected output? What do you see instead?
I am expecting it to be able to increment numbers larger than 2^63-1 
consistently.

What version of the product are you using? On what operating system?
VERSION 1.4.13

We are running a unit test that does the following
set foo 0 0 20
18446744073709551615
STORED
incr foo 2

The test expects a result of 1. This test is run on dozens of virtual machines 
many times a day. It has about a 1% failure rate. If you repeat the command 
"incr foo 2" again it will fail again. If you repeat the "set foo 0 0 20, 
18446744073709551615" before the  whole test again it will usually pass. 
Occasionally a memcached sever will into a state where setting any large 
integer and trying to increment or decrement it will fail as shown in the 
reproduction steps above.

We are running VMware virtual machines with ubuntu 10.04 for all of the above.

Original issue reported on code.google.com by poffenwa...@imvu.com on 8 Nov 2012 at 12:39

GoogleCodeExporter commented 9 years ago
This never happens with smaller numbers?

Is this test running against instances that are loaded with traffic? Is there 
any chance the key is getting changed in the meantime?

Also:

Any chance you could attempt to reproduce with 1.4.15? Think I closed a race 
condition in there.
Any chance your test could do a "get asdf" if it receives a CLIENT_ERROR to see 
what's inside the key?

sorry for the long wait :/

Original comment by dorma...@rydia.net on 18 Jan 2013 at 9:08

GoogleCodeExporter commented 9 years ago
Answers to your questions:
We have never had it occur with smaller numbers the repro case I showed can be 
repeated once it gets into the state that allows repro with freshly set values.

We have detected it in our testing builder which are likely only serving one 
test at a time. There could be simultaneous requests in a test but the memcache 
server could not have more than 2 or 3 requests to fulfill at a time and far 
more likely only 1. When we are manually reproducing the "glitch" with telnet 
it is definitely only 1.

I have not yet tried it with 1.4.15 but I can. It will take some time to 
upgrade enough builder to catch the intermittent in reasonable time.

I have on occasion done a get after the error and the value 
"9223372036854775808" is return as you would expect if the increment failed.

Original comment by poffenwa...@imvu.com on 18 Jan 2013 at 4:35

GoogleCodeExporter commented 9 years ago
I spent some time looking at this and couldn't figure it out, but we've fixed a 
few bugs related to incr/decr in the last two versions. I'm going to 
tentatively close this as fixed, but if you can grab 1.4.17 (or newer by the 
time you see this) and try it out, I'd be curious to see if it still happens.

Thanks, and sorry for the long delay.

Original comment by dorma...@rydia.net on 21 Dec 2013 at 6:23