Mild load test performance degrades on i/o wait in XMemcachedClient:latchWait:2598

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
It's likely to be specific to our application usage pattern, but we're seeing 
this issue with the XMemcached client used by the L2 cache plug-in of 
DataNucleus to store Java Data Objects (JDO).

What is the expected output? What do you see instead?
As we ramp up load/traffic on our application, our service response time 
degrades fairly quickly with just 20 vUsers. 
APM tools identify the hotspots in XMemcachedClient:latchWait:2598 with 
reported response times going well over 1000ms while it's normally super-quick.

What version of the product are you using? On what operating system?
* XMemcached 1.4.1
* memcached 1.4.1.3
* Java 1.7.0_17
* RHEL 5 VM

Please provide any additional information below.
Neither the CPUs of memcached servers nor xmemcached client systems seems to be 
under stress, and the service response time degradation is very much 
concentrated into that latch await call, so we suspect there's some resource 
contention going on, but not completely understanding the underlying 
architecture, we're having difficulties troubleshooting the issue.

jstack reports 8 reactor threads and 8 read threads for each of the 2 JVMs.

Does anyone have any theory on what resource may cause contention on 
cmd.getLatch().await() and what tuning parameter(s) could be used to provide 
some relief?
Thank you!

Original issue reported on code.google.com by cyril.bo...@koinzmedia.com on 31 May 2013 at 1:24

GoogleCodeExporter commented 8 years ago

Hi,are u using connection pool?

If you don't,you can try it.Is your data stored in cache very large? If so, it 
may consume too much time on IO transfer.

Original comment by killme2...@gmail.com on 31 May 2013 at 6:05

GoogleCodeExporter commented 8 years ago

The JDO cache plug-in seems to use default settings, which looks like it'd 
create a pool of size 1 by default?
Is there a strategy about sizing this connection pool? Or just trial & error?
Is that pool the number of TCP connections to each memcached server?
Is the theory that only one request can go through a particular TCP connection 
to a memcached server at a time, so the concurrent requests to the same server 
are serialized and create a bottleneck under load?

Our JDOs are simple POJOs so not really big. Our memcached servers store about 
2,000 of them under load, which doesn't feel that much either.

Original comment by cyril.bo...@koinzmedia.com on 31 May 2013 at 5:51

GoogleCodeExporter commented 8 years ago

Hello, we tried a pool size of 5 and 10 and it didn't seem to help at all. If 
anything it made it a bit worse.

The following log seems to suggest the change did take effect:
[#|2013-05-31T11:02:10.019-0700|WARNING|glassfish3.1.2|net.rubyeye.xmemcached.XM
emcachedClient|_ThreadID=107;_ThreadName=Thread-6;|#XMemcachedClient.connect: 
You are using connection pool for xmemcached client,it's not recommended unless 
you have test it that it could increase performance in your app.|#]

Unless you recommend we go higher it looks like this is not the bottleneck...
Any other ideas?

Original comment by cyril.bo...@koinzmedia.com on 31 May 2013 at 6:44

GoogleCodeExporter commented 8 years ago

The lock wait is all resulting from get/get0/fetch0 call stacks by the way if 
it's helpful.

Original comment by cyril.bo...@koinzmedia.com on 31 May 2013 at 7:13

GoogleCodeExporter commented 8 years ago

We also tried increasing xmemcached.selector.pool.size to 20 and it seemed to 
double the number of threads to 16 reactors and 16 read, but it didn't help 
either...

Original comment by cyril.bo...@koinzmedia.com on 1 Jun 2013 at 1:28

GoogleCodeExporter commented 8 years ago

Maybe your cache item size is too large,please get the item statistics from 
memcached.

Another option is reducing the merge factor of gets/get, xmemcached will merge 
at most 150 get/gets commands as one batch-get command to increase 
throughput,but it will increase the latency.You can decrease the merge factor 
by:

memcachedClient.setMergeFactor(50); // merge gets to multi get. (default is 150)
memcachedClient.setOptimizeMergeBuffer(false); // combine the requests to 
socket buffer size to send to the server (default is true).

More info you can visit this page: 
http://www.javawhat.com/showWebsiteContent.do?id=528195

Original comment by killme2...@gmail.com on 2 Jun 2013 at 8:00

GoogleCodeExporter commented 8 years ago

Please see a sample stats from one of our servers below. 
All objects are < 4kb. How do you define "too large"?
Also fyi most of the contention seems to happen on GET queries which are 
misses, so not sure how the item sizes would get involved here...

stats sizes
STAT 672 70
STAT 896 2
STAT 928 25
STAT 960 28
STAT 1120 29
STAT 1184 2
STAT 1632 4
STAT 1696 27
STAT 1792 1
STAT 2144 28
STAT 2400 31
STAT 2432 34
STAT 3328 1
STAT 3712 35
END

As for the merge factor it's difficult for me to imagine our current load 
actually would generate full batch sizes... Even then it shouldn't take 10s to 
get 150 objects in one net query. Is there a way to get statistics on the use 
of this batching mechanism?

Original comment by cyril.bo...@koinzmedia.com on 3 Jun 2013 at 7:34

GoogleCodeExporter commented 8 years ago

Client statistics https://code.google.com/p/xmemcached/wiki/Statistics

You may have to monitor your JVM GC,maybe it GC too frequently.

Original comment by killme2...@gmail.com on 4 Jun 2013 at 3:33

GoogleCodeExporter commented 8 years ago

And it's an old benchmark result at here 
http://xmemcached.googlecode.com/svn/trunk/benchmark/benchmark.html

Original comment by killme2...@gmail.com on 4 Jun 2013 at 3:33

GoogleCodeExporter commented 8 years ago

We load-tested the suggested socketOptions and mergeFactor, but neither helped. 
Lock wait got worse.

We use G1GC which has very low pauses rarely over 1 second, which we see 
latchWait for up to 10 seconds.

Where does -Dxmemcached.statistics.enable=true provide the data? You have to 
fetch it w/ JMX? Is there any option to make it log samples to a file?

Original comment by cyril.bo...@koinzmedia.com on 5 Jun 2013 at 10:29

GoogleCodeExporter commented 8 years ago

The default merge factor is 50 not 150, so you can decrease it to 10 or more 
less.

Original comment by killme2...@gmail.com on 9 Jun 2013 at 7:37

GoogleCodeExporter commented 8 years ago

Hi!

We have a similar issue in our app(It's a backend app with a big thread pool of 
say 150 threads). It works fine from start(may be running for a day until issue 
shows up) but at a certain point of time(looks arbitrary) almost all app's 
activity is locked up in XMemcached client. Most of the thread are waiting 
inside latchWait method.

I've updated client to 1.4.1 and set up a thread pool(to 100 connections). Will 
see if it helps.

Does anyone has any ideas why this can be happening?

Thanks,
Ilya

Original comment by Ilya.Sor...@muranosoft.com on 10 Jul 2013 at 9:10

ahww520 / xmemcached

Mild load test performance degrades on i/o wait in XMemcachedClient:latchWait:2598 #254