incr/decr operations are not thread safe.

GoogleCodeExporter commented 9 years ago

incr/decr operations are not thread safe.

- What steps will reproduce the problem?
Please use this code to reproduce the problem.
http://gist.github.com/348889

$ memcached -t 16

$ ./test_ascii
expected: 100000
result: 99939

Change this line to use binary protocol and test again.
bool g_binary = true;

$ ./test_binary
expected: 100000
result: 99927

- What version of the product are you using? On what operating system?
memcached 1.4.4
Red Hat Enterprise Linux 5.4 x86_64

- Please provide any additional information below.
Here is a sample patch to fix the problem.

*** memcached.c_org     2009-11-27 14:45:13.000000000 +0900
--- memcached.c 2010-03-31 01:29:22.000000000 +0900
***************
*** 54,59 ****
--- 54,60 ----
  #endif
  #endif

+ pthread_mutex_t incr_lock = PTHREAD_MUTEX_INITIALIZER;
  /*
   * forward declarations
   */
***************
*** 1017,1022 ****
--- 1018,1024 ----
                  req->message.body.expiration);
      }

+     pthread_mutex_lock(&incr_lock);
      it = item_get(key, nkey);
      if (it && (c->binary_header.request.cas == 0 ||
                 c->binary_header.request.cas == ITEM_get_cas(it))) {
***************
*** 1082,1087 ****
--- 1084,1090 ----

          write_bin_error(c, PROTOCOL_BINARY_RESPONSE_KEY_ENOENT, 0);
      }
+     pthread_mutex_unlock(&incr_lock);
  }

  static void complete_update_bin(conn *c) {
***************
*** 2780,2785 ****
--- 2783,2789 ----
          return;
      }

+     pthread_mutex_lock(&incr_lock);
      it = item_get(key, nkey);
      if (!it) {
          pthread_mutex_lock(&c->thread->stats.mutex);
***************
*** 2791,2796 ****
--- 2795,2801 ----
          pthread_mutex_unlock(&c->thread->stats.mutex);

          out_string(c, "NOT_FOUND");
+         pthread_mutex_unlock(&incr_lock);
          return;
      }

***************
*** 2806,2811 ****
--- 2811,2817 ----
          break;
      }
      item_remove(it);         /* release our reference */
+     pthread_mutex_unlock(&incr_lock);
  }

  /*

Thanks.

Original issue reported on code.google.com by sadao.hi...@gmail.com on 30 Mar 2010 at 4:34

GoogleCodeExporter commented 9 years ago

It seems to me the correct fix is to eliminate inplace incr/decr,
rather than new locks (it possibly damages to scalability).

Also see the report, http://bit.ly/9Dj9gI

In the engine branch, this matter was already fixed.

http://github.com/trondn/memcached/commit/8e59e16a94fcf5f1e5fe1271a71a7958357853
f8

I think it should be backported.

Original comment by kai...@ak.jp.nec.com on 1 Apr 2010 at 9:01

GoogleCodeExporter commented 9 years ago

dustin and I talked about this briefly last night. That looks like the fix, but 
it's
likely to be a performance regression (if slight).

I'll verify the report with the fix a bit later and will see about qualifying 
the
performance change in the future. Not sure what else can be realistically done 
here
easily.

We'd have to end up handling counters more specifically in the send code or
something... like the way we send back suffixes. So again it's worth testing to 
see
if it actually does regress before doing more code.

Original comment by dorma...@rydia.net on 2 Apr 2010 at 12:45

GoogleCodeExporter commented 9 years ago

I just tried backporting the patch but I failed at it and it caused a 7% 
performance
regression in running that test alone.

So we're going to punt on this for 1.4.5 and try harder for 1.4.6. Sorry :/ 
Thanks
for the test, it's very helpful.

Original comment by dorma...@rydia.net on 3 Apr 2010 at 6:54

GoogleCodeExporter commented 9 years ago

Err, I figured it out, but it's still slower and not entirely correct so we'll 
have
to figure it more later.

do_add_delta has the global cache_lock, so it should've never had a race 
condition?
But the item that's passed into it is fetched from outside of the big lock. So 
an
incremented item could end up being junked by another thread. Added a little 
bit to
trond's patch to re-fetch the item once inside do_add_delta and the test came 
back
100% every time. It's still 7% slower, and I'm not sure if trond had that fixed
elsewhere in the engine branch. I couldn't find it on a cursory look.

To fix it more, do_add_delta needs to be passed in a key,keylen tuple instead 
of an
item and return a NOT_FOUND if the internal fetch doesn't find the item. But 
again
we'll revisit it later :/

Original comment by dorma...@rydia.net on 3 Apr 2010 at 7:12

GoogleCodeExporter commented 9 years ago

I'll have to follow this up with another patch as mine's still incomplete, but 
I had
a thought...

When items are being sent back to a client, they have their reference counters
incr/decr'ed. If we move the item_get down into do_add_delta (and use 
do_item_get),
what we can do is then also test to see if the refcount > 0 on the item. If it 
is,
another thread is copying the data to the client or modifying it in some way. 
So then
we allocate a new item and do the replace.

But if recount == 0, we can modify the item in place. This *should* be safe 
since
item_get and add_delta operate under the same lock.

Bonus points would be optimizing counters to use thread-local write buffers on
return. I can think of mildly complicated things to shorten the amount of time a
counter is reflocked and decrease the number of expensive re-allocations:

- Use thread-local write buffers for returning counters. Flatly allocated to
INCR_MAX_STORAGE_LEN.
- In a counter, store a uint64_t into the first 8 bytes of the value, and the 
string
of the value in free bytes after that.
- in do_add_delta, remove the safe_strtoull call and simply do the math
- do the refcount check described above
- then flatten the value back out via snprintf into the end of the value.
- when returning a counter, we'd have to use some bit somewhere to determine 
that
it's a counter (haven't looked at the item struct for ideas yet)
- then memcpy the non-uint64_t of the counter into the send buffer for the 
client.
- only hold the refclock on the counter in the client for the time it takes to 
fetch
the item, determine it's a counter, toss a buffer into the return chain, and 
memcpy
into it.

That should greatly reduce the amount of time counter items stay locked. Also 
the
amount of time we hold the global lock while processing incr/decr.

Extra bonus points for doing the incr-or-decr detection outside of do_add_delta 
and
passing in a positive or negative value to be added to the counter.

That's on the memory-for-cpu tradeoff edge. we could also test just storing 8 
bytes
into counter values, doing most of the above, and calling snprintf from the 
value
into the client buffer on every return.

Low priority compared to getting the engine work finished, but worth testing I 
think
:) In the shortterm I have a good feeling that the refcount trick will restore 
it to
roughly the same speed it was before.

Original comment by dorma...@rydia.net on 4 Apr 2010 at 9:21

GoogleCodeExporter commented 9 years ago

Hey everyone, any update on this issue after 5 months? Is the 
increment/decrement family (with_initial, by_key...) thread safe now?

Original comment by tylerzh...@gmail.com on 9 Sep 2010 at 3:06

GoogleCodeExporter commented 9 years ago

This has finally been fixed and merged into 1.4.6-rc1, please reopen if it's 
broken.

1.6 has it fixed via another approach, but I'm not sure if that work is 
completed as of now.

Original comment by dorma...@rydia.net on 12 Jul 2011 at 11:16

GoogleCodeExporter commented 9 years ago

trying to close this again.

Original comment by dorma...@rydia.net on 13 Jul 2011 at 7:20

Changed state: Fixed

Jdesk / memcached

incr/decr operations are not thread safe. #127