bigdata4u / spymemcached

Automatically exported from code.google.com/p/spymemcached
0 stars 0 forks source link

Operation class is used in un-threadsafe, unsynchronized manner. #195

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What version of the product are you using? On what operating system?
2.7 on Ubuntu.

Tell me more...

Short version: patch attached, but I'm willing to send this through your gerrit 
too if you'll let my review.couchbase.org account (dtmGoogle) upload to the 
spymemcached project.

Long version: We discovered that having one thread call get() on futures with a 
timeout much shorter than the timeout built into the client could (after a 
while) reliably kill off the IO thread with a NullPointerException. After 
restarting with assertions enabled, we found many other ways to make the IO 
thread die relating mostly to operations that were canceled or timed out in one 
thread while they were being used in another thread.

The solution to this is to make "timing out" an operation something similar to 
"cancel" - that is, not a state of the operation's state machine, but just a 
flag. Then, since this flag (as with the cancelled flag) can be read and set 
from multiple threads, synchronize the methods that do that.

Ideally, transitionState and getState would never be called outside the IO 
thread; however, since at least getState is currently, both getState and 
transitionState should then also be synchronized.

With these changes, we don't end up killing off the IO thread anymore. (There 
may still be unthreadsafe stuff going on aro, but this is now stable)

Original issue reported on code.google.com by dtm@google.com on 22 Aug 2011 at 1:57

Attachments:

GoogleCodeExporter commented 8 years ago
Again, thanks for the patch.  This may explain an issue we've been 
investigating recently

Original comment by ingen...@gmail.com on 23 Aug 2011 at 5:36

GoogleCodeExporter commented 8 years ago
Added to couchbase Gerrit as http://review.couchbase.org/9402

Original comment by dtm@google.com on 6 Sep 2011 at 2:15

GoogleCodeExporter commented 8 years ago
This has been merged into the 2.7.2 and 2.8 branch. Attached is a 2.7.2 jar 
with the fix.

Original comment by mikewie...@gmail.com on 6 Sep 2011 at 11:46

Attachments:

GoogleCodeExporter commented 8 years ago
Hello!

I am also running into this issue, specifically getting the assertion error "No 
output buffer" thrown by BaseOperationImpl.java:79. The error occurs whenever 
memcached is losing a connection and attempts to reconnect 
(MemcachedConnection.queueReconnect). However, I am only experiencing this the 
moment I enabled the binary communication mode. 

I looked into the patch above which seems to tackle this problem, but also 
wraps literally every method of the Operation object with synchronized locks, 
and I am a bit skeptical about the impact on performance.

Do you have any further information on this issue, especially why it 
exclusively happens with binary connections, while ASCII mode works fine?

Thanks!

Sebastian

Original comment by buzz...@gmail.com on 10 Oct 2011 at 6:59