gchm2010 / simple-spring-memcached

Automatically exported from code.google.com/p/simple-spring-memcached
MIT License
0 stars 0 forks source link

ReadThroughMultiCache returns incomplete results if memcached client times out #20

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
--------------------------------------
1. Use the Spy client, configured with a short (but not completely 
unreasonable) timeout value of 50ms.
2. Clear the cache.
3. Make a request through a method annotated with ReadThroughMultiCache, asking 
for a largish number of objects by ID (200 in my case).  Relevant method 
annotation:
    @ReadThroughMultiCache(
            namespace = "UserSvc/users",
            expiration = 3600,
            option = @ReadThroughMultiCacheOption(generateKeysFromResult=true, addNullsToCache = true, skipNullsInResult = true))
    public List<User> getObjectsByIds(@ParameterValueKeyProvider List<String> objectIds) {...}

What is the expected output?  What do you see instead?
------------------------------------------------------
When the underlying method returns all 200 objects, I would expect the SSM 
results to match the results of the annotated method.  However, if the memcache 
client throws a CheckedOperationTimeoutException, only some of the results are 
returned.  The number of returned objects depends on when an error is thrown.  
See additional logged details below.

What version of the product are you using? On what operating system?
--------------------------------------------------------------------
I've reproduced this issue using v-3.1.0 and v-3.2.0 (spymemcached-provider), 
on both RHEL-6.4 and OSX-10.7.5.  I haven't managed to catch this with a unit 
test yet.  I can reliably reproduce this in a multiple-machine environment 
though.

Please provide any additional information below.
------------------------------------------------

What appears to be happening is that after the error is thrown, SSM calls the 
annotated method a second time (with a subset of the original IDs [I assume 
these are the ones not found in cache on the first attempt]).  However, rather 
than merging the results of the two calls, SSM appears to be returning only the 
results of the second method call.  

Relevant logs below
===================

# Manually clear the cache.
18:53:40.330 [pool-1-thread-3] INFO  
com.dreamworks.dwadigital.management.spymemcached.MemcachedStats - Memcached 
cache flush(): succeeded

# Implied:
#  SSM intercepts a call to getObjectsByIds(...)
#  Cache miss, so SSM calls our method with all 200 IDs.

# Our DAO is asked to look up all 200 of the objects by their ID, which it does.
18:53:45.146 [http-/127.5.78.1:8080-1] DEBUG 
com.dreamworks.dwadigital.mongo.GenericDao - getObjectsByIds: request count=200
18:53:45.537 [http-/127.5.78.1:8080-1] DEBUG 
com.dreamworks.dwadigital.mongo.GenericDao - getObjectsByIds: response count=200

# SSM tries to store the 200 objects into cache, but fails for some, because of 
timeouts.
18:53:45.635 [http-/127.5.78.1:8080-1] WARN  com.google.code.ssm.CacheImpl - 
Cannot set on key UserSvc/users:518a1d9ce4b0963488cbc2e6
com.google.code.ssm.providers.CacheException: 
java.util.concurrent.ExecutionException: 
net.spy.memcached.internal.CheckedOperationTimeoutException: Operation timed 
out. - failing node: 
memcache01-dev-igo.las.virtualdreamworks.com/10.205.48.181:11211
    at com.google.code.ssm.providers.spymemcached.MemcacheClientWrapper.set(MemcacheClientWrapper.java:295) ~[spymemcached-provider-3.2.0.jar:na]
    at com.google.code.ssm.CacheImpl.set(CacheImpl.java:279) ~[simple-spring-memcached-3.2.0.jar:na]
    at com.google.code.ssm.CacheImpl.set(CacheImpl.java:120) ~[simple-spring-memcached-3.2.0.jar:na]
    at com.google.code.ssm.CacheImpl.setSilently(CacheImpl.java:136) ~[simple-spring-memcached-3.2.0.jar:na]
    at com.google.code.ssm.aop.ReadThroughMultiCacheAdvice.generateByKeysFromResult(ReadThroughMultiCacheAdvice.java:149) ~[simple-spring-memcached-3.2.0.jar:na]
    at com.google.code.ssm.aop.ReadThroughMultiCacheAdvice.cacheMulti(ReadThroughMultiCacheAdvice.java:123) ~[simple-spring-memcached-3.2.0.jar:na]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_25]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_25]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_25]
    at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_25]
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:631) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at com.dreamworks.dwadigital.user.persistence.UserCachingDao$$EnhancerByCGLIB$$d3743a80.getObjectsByIds(<generated>) ~[spring-core-3.2.0.RELEASE.jar:na]
    [snip]

# Same error as above, but different cache key
18:53:45.834 [http-/127.5.78.1:8080-1] WARN  com.google.code.ssm.CacheImpl - 
Cannot set on key UserSvc/users:518be9dfe4b02ee6e55a5e25
[snip]

# Same error as above, but different cache key
18:53:45.936 [http-/127.5.78.1:8080-1] WARN  com.google.code.ssm.CacheImpl - 
Cannot set on key UserSvc/users:518d37e9e4b0f21cc2e8d2dd
[snip]

# SSM sees that not all objects were successfully pulled from cache, so it 
calls the annotated method again.
18:53:46.031 [http-/127.5.78.1:8080-1] WARN  
com.google.code.ssm.aop.ReadThroughMultiCacheAdvice - Caching on 
execution(UserCachingDao.getObjectsByIds(..)) aborted due to an error. The 
underlying method will be called twice.
java.lang.RuntimeException: Timed out waiting for operation
    at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:81) ~[spymemcached-2.8.1.jar:2.8.1]
    at com.google.code.ssm.providers.spymemcached.MemcacheClientWrapper.set(MemcacheClientWrapper.java:289) ~[spymemcached-provider-3.2.0.jar:na]
    at com.google.code.ssm.CacheImpl.set(CacheImpl.java:279) ~[simple-spring-memcached-3.2.0.jar:na]
    at com.google.code.ssm.CacheImpl.set(CacheImpl.java:120) ~[simple-spring-memcached-3.2.0.jar:na]
    at com.google.code.ssm.CacheImpl.setSilently(CacheImpl.java:136) ~[simple-spring-memcached-3.2.0.jar:na]
    at com.google.code.ssm.aop.ReadThroughMultiCacheAdvice.generateByKeysFromResult(ReadThroughMultiCacheAdvice.java:149) ~[simple-spring-memcached-3.2.0.jar:na]
    at com.google.code.ssm.aop.ReadThroughMultiCacheAdvice.cacheMulti(ReadThroughMultiCacheAdvice.java:123) ~[simple-spring-memcached-3.2.0.jar:na]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_25]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_25]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_25]
    at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_25]
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:631) ~[spring-aop-3.2.0.RELEASE.jar:3.2.0.RELEASE]
    at com.dreamworks.dwadigital.user.persistence.UserCachingDao$$EnhancerByCGLIB$$d3743a80.getObjectsByIds(<generated>) ~[spring-core-3.2.0.RELEASE.jar:na]
    [snip]
    at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
Caused by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed 
out waiting for operation - failing node: 
memcache01-dev-igo.las.virtualdreamworks.com/10.205.48.181:11211
    at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:93) ~[spymemcached-2.8.1.jar:2.8.1]
    at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:79) ~[spymemcached-2.8.1.jar:2.8.1]
    ... 53 common frames omitted

# The annotated method gets called again.  It returns all of the results it was 
asked for.
18:53:46.032 [http-/127.5.78.1:8080-1] DEBUG 
com.dreamworks.dwadigital.mongo.GenericDao - getObjectsByIds: request count=179
18:53:46.338 [http-/127.5.78.1:8080-1] DEBUG 
com.dreamworks.dwadigital.mongo.GenericDao - getObjectsByIds: response count=179

# BUG: The caller of the cached method gets just some of the results, rather 
than all of them.
18:53:46.339 [http-/127.5.78.1:8080-1] DEBUG 
com.dreamworks.dwadigital.user.service.UserService - queryUsers: response 
count=179

Original issue reported on code.google.com by null.poi...@gmail.com on 23 Sep 2013 at 7:13

GoogleCodeExporter commented 8 years ago
Thank you for the report. I'll try to reproduce the error.
Could you please check if the error also occurs with xmemcached provider?

Original comment by ragno...@gmail.com on 24 Sep 2013 at 9:44

GoogleCodeExporter commented 8 years ago
Thanks.  We're actually using some Spy-specific functionality (JMX monitoring), 
so I can't easily swap one for the other right now.  I'm pretty swamped right 
now with something else, but when I get some time, I'll do what's needed to 
verify with xmemcached.

Original comment by null.poi...@gmail.com on 24 Sep 2013 at 9:34

GoogleCodeExporter commented 8 years ago
I was able to reproduce the issue

Original comment by ragno...@gmail.com on 25 Sep 2013 at 7:57

GoogleCodeExporter commented 8 years ago
Fix is available in trunk. 
If it possible please build SSM from trunk and test it.
I'm also considering modifying spymemcached-provider to wrap all 
RuntimeExceptions into TimeoutException or CacheException.

Original comment by ragno...@gmail.com on 25 Sep 2013 at 9:53

GoogleCodeExporter commented 8 years ago
Thank you so much!  I've built SSM trunk locally, and can confirm the fix.

Catching all runtime errors during the cache operations makes sense to me.  
Caching should always be optional, and I can't think of any scenarios when that 
should be violated, regardless of how loudly the cache client says otherwise.

For now, I've shoved the build into our Nexus repository.  How long might I 
expect before a fixed version is available from Maven central?

I'm glad you were able to get a repro so quickly.  It was easy for me to repro 
in one particular environment for me, but I still can't get the timing right on 
my local machine.

BTW, I spent some time today trying to get a repro with the xmemcached client.  
Doing that made me realize how many contortions I've gone to in my Spring 
wiring, to get access to the actual spy client (MemcachedClientIF).  I need 
access to it for some bulk CASMutator operations outside of SSM.  If you find 
an easy way to expose that client (either through the CacheClientFactory, or 
via a Spring @Resource), that would be great.  Alternatively, maybe it makes 
more sense to let people inject their own pre-built client into SSM somehow.

Original comment by null.poi...@gmail.com on 26 Sep 2013 at 12:04

GoogleCodeExporter commented 8 years ago
I'm glad that it works.
Frankly speaking I haven't even tried to reproduce the issue in the same way as 
you noticed it because timeout exceptions are non-deterministic and depend on 
environment. Instead I've temporary modified SSM code to throw RuntimeException 
from generateByKeysFromResult method to simulate your issue. It was enough to 
find a reason of the issue.

Both ideas to expose the client in easy way or let people inject their own 
pre-build client are good. Unfortunately the second one won't work with one of 
SSM features: runtime node switching in this case SSM factory need to create a 
new client using a new IP address. Instead I can provide a hook that will allow 
to register listener on post initialize phase of memcached client. What do you 
think?

Regarding the SSM release time. If above features are not something that you 
must have then I think I'll find a time to release SSM at the weekend. The 
release will contain only the fix.

Original comment by ragno...@gmail.com on 26 Sep 2013 at 5:24

GoogleCodeExporter commented 8 years ago
Personally, I'd say this bug warrants a release fairly soon.  Everything else 
was just feature requests, and I'll be happy if you get to them in a couple 
months, if at all.  If you can manage a new version this weekend, that would be 
fantastic, but otherwise I can work from trunk for now.

Ironically, one of the big reasons I was interested in being able to inject my 
own client was so I could mock it to throw timeout errors like you did.  I 
don't think I'd ever want to inject a client outside of a test situation 
though.  Maybe you can provide a way to inject test/mock clients, with the 
documented caveat that it won't deal with node switching in that case?

Getting at the actual Spy/X-client instance via your factory would be super 
useful though.  As it stands, I had to copy your factory, and put it in a 
com.google package, so I could access a sealed/package-protected constructor 
for one of the client wrapper classes.

Original comment by null.poi...@gmail.com on 26 Sep 2013 at 7:17

GoogleCodeExporter commented 8 years ago
The fix has been released in 3.2.1. The release is available in central maven 
repository. 

Original comment by ragno...@gmail.com on 30 Sep 2013 at 5:02