ehcache / ehcache3

Ehcache 3.x line
http://www.ehcache.org
Apache License 2.0
2.01k stars 579 forks source link

100% CPU utilisation when used as a JSR-107 compliant cache for Spring framework 4.1.4 #387

Closed drekka closed 9 years ago

drekka commented 9 years ago

I have the following Java config setup in my Spring container:

@Configuration
@EnableCaching
public class FXCacheConfig {

@Value("${" + FXConstants.CACHE_RATES_TTL + "}")
private int ratesCacheTTL;

@Bean
public JCacheManagerFactoryBean cacheFactory() {
    return new JCacheManagerFactoryBean();
}

@Bean
public CacheManager jCacheManager(javax.cache.CacheManager jsr107CacheManager) {

    MutableConfiguration<Object, ExchangeRates> config = new MutableConfiguration<>();
    config.setStoreByValue(false);
    config.setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new Duration(TimeUnit.MILLISECONDS,
            this.ratesCacheTTL)));
    jsr107CacheManager.createCache(FXConstants.CACHE_RATES_NAME, config);

    return new JCacheCacheManager(jsr107CacheManager);
}

} 

I've then run this setup on Jetty 9 and JBoss 8.2. On both servers the CPU goes to 100% (and beyond). If I take this out and replace it with the equivalent Guava or ConcurrentMapCache implementations as per the Spring documents, the server uses <5%.

I've tried to profile the server but have not been able to isolate the problem. All I can tell so far is that with EHCache 3, my CPUs max out.

I've also tried removing the cache setup from the above code, and removing all the @Cache... annotations so there is nothing left but the raw setup. CPUs still 100%+.

Any ideas what to try next?

Derek

lorban commented 9 years ago

Did you take some thread dumps when your app was running with ehcache and maxing out the CPU? JTop (https://code.google.com/p/hatter-source-code/wiki/jtop) can also be helpful figuring out what threads are consuming the CPU.

drekka commented 9 years ago

Hi, I've run jProfiler and YourKit. This is where it gets weird, I cannot find any threads that are chewing CPU. Everything appears ok and EHCache doesn't appear in any of the dumps I've taken. But the presence of EHCache is all that is required to max out the CPUs. If I swap to JDK or Guava caches, the system stops chewing CPU. I'll give JTop a try. Thanks.

alexsnaps commented 9 years ago

Maybe you could share the strip down project? See if it can be reproduced?! Could be environmental... On Sun, Apr 26, 2015 at 8:25 PM Derek Clarkson notifications@github.com wrote:

Hi, I've run jProfiler and YourKit. This is where it gets weird, I cannot find any threads that are chewing CPU. Everything appears ok and EHCache doesn't appear in any of the dumps I've taken. But the presence of EHCache is all that is required to max out the CPUs. If I swap to JDK or Guava caches, the system stops chewing CPU. I'll give JTop a try. Thanks.

— Reply to this email directly or view it on GitHub https://github.com/ehcache/ehcache3/issues/387#issuecomment-96451511.

drekka commented 9 years ago

Tried JTop and I'm still rather confused. My system is showing me two different stories. Using normal Top, my system is showing massive CPU usage by the java process:

Processes: 274 total, 3 running, 13 stuck, 258 sleeping, 1402 threads                                                                                                                        10:49:49
Load Avg: 2.37, 2.08, 2.00  CPU usage: 13.76% user, 11.71% sys, 74.51% idle  SharedLibs: 141M resident, 0B data, 15M linkedit. MemRegions: 82423 total, 3578M resident, 87M private, 932M shared.
PhysMem: 8113M used (1352M wired), 70M unused. VM: 732G vsize, 1353M framework vsize, 250967(0) swapins, 394040(0) swapouts. Networks: packets: 9753285/4027M in, 9342538/2382M out.
Disks: 2247821/37G read, 2374601/64G written.

PID    COMMAND      %CPU      TIME     #TH   #WQ  #PORT MEM    PURG   CMPRS  PGRP  PPID  STATE    BOOSTS           %CPU_ME %CPU_OTHRS UID  FAULTS    COW     MSGSENT    MSGRECV    SYSBSD
32721  java         169.4     11:49.28 38/1  1    207-  522M+  16K    0B     32418 32418 running  *0[86]           0.00000 0.00000    501  19055832+ 568     8872886+   4434023+   70116760+
0      kernel_task  4.3       60:55.07 116/8 0    2     1035M  0B     0B     0     0     running   0[0]            0.00000 0.00000    0    81889     0       451839037+ 432287466+ 0
32969  top          3.2       00:07.84 1/1   0    146+  4428K  0B     0B     32969 32911 running  *0[1]            0.00000 0.00000    0    432856+   92      1647541+   823469+    234068+ 3

Then she I run Jtop, it shows me almost no usage (Same result from JProfiler and YourKit):

Heap Memory: INIT=128.00M  USED=259.05M  COMMITED=486.50M  MAX=910.50M
NonHeap Memory: INIT=23.44M  USED=44.71M  COMMITED=77.56M  MAX=560.00M
GC PS Scavenge  VALID  [PS Eden Space, PS Survivor Space]  GC=35702  GCT=50667
GC PS MarkSweep  VALID  [PS Eden Space, PS Survivor Space, PS Old Gen, PS Perm Gen]  GC=2  GCT=201
ClassLoading LOADED=7770  TOTAL_LOADED=7775  UNLOADED=5
Total threads: 20  CPU=4 (0.23%)  USER=3 (0.20%)
NEW=0  RUNNABLE=13  BLOCKED=0  WAITING=2  TIMED_WAITING=5  TERMINATED=0

RMI TCP Connection(1)-192.168.1.101  TID=291397  STATE=RUNNABLE  CPU_TIME=4 (0.20%)  USER_TIME=3 (0.18%) Allocted: 253.65K
sun.management.ThreadImpl.dumpThreads0(Native Method)
sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:446)
me.hatter.tools.jtop.rmi.RmiServer.listThreadInfos(RmiServer.java:59)
me.hatter.tools.jtop.management.JTopImpl.listThreadInfos(JTopImpl.java:48)

Abandoned connection cleanup thread  TID=8  STATE=TIMED_WAITING  CPU_TIME=0 (0.02%)  USER_TIME=0 (0.01%) Allocted: 0
java.lang.Object.wait(Native Method)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:43)

JMX server connection timeout 291571  TID=291571  STATE=TIMED_WAITING  CPU_TIME=0 (0.00%)  USER_TIME=0 (0.00%) Allocted: 1.59K
java.lang.Object.wait(Native Method)
com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:168)
java.lang.Thread.run(Thread.java:744)

 qtp1265909690-3210073  TID=3210073  STATE=TIMED_WAITING  CPU_TIME=0 (0.00%)  USER_TIME=0 (0.00%) Allocted: 0
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:389)

It appears that the Java tooling is not aware of something that's chewing CPU, but OS X is.

drekka commented 9 years ago

I've now created a vanilla project. Unfortunately due to my client's rules regarding IP issues, security etc, I cannot upload a project. However I've stripped all the dependencies, and removed all the code and other files until all that exists are two files. A application loader and the cache config. Here is the app loader:

public class TestApplicationLoader implements WebApplicationInitializer {

@Override
public void onStartup(ServletContext container) {

    // Create the dispatcher servlet's Spring application context
    AnnotationConfigWebApplicationContext dispatcherContext = new AnnotationConfigWebApplicationContext();
    dispatcherContext.register(TestConfig.class);

    // Register and map the dispatcher servlet
    ServletRegistration.Dynamic dispatcher = container.addServlet("test", new DispatcherServlet(
            dispatcherContext));
    dispatcher.setLoadOnStartup(1);

    // Add our mappings.
    dispatcher.addMapping("/*");

}

}

And here is the cache test config:

@Configuration
@EnableWebMvc
public class TestConfig {

@Bean
public javax.cache.CacheManager ehCache() {
    return Caching.getCachingProvider().getCacheManager();
}

@Bean
public CacheManager jCacheManager(javax.cache.CacheManager jsr107CacheManager) {
    MutableConfiguration<Object, Object> config = new MutableConfiguration<>();
    jsr107CacheManager.createCache("test", config);
    return new JCacheCacheManager(jsr107CacheManager);
}

}

There are no other files. The gradle dependencies are:

compile "org.springframework:spring-webmvc:4.1.4.RELEASE",
        "org.springframework:spring-context-support:4.1.4.RELEASE", 
        "org.ehcache:ehcache:3.0.0.m1", 
        "javax.cache:cache-api:1.0.0", 
        "javax.servlet:javax.servlet-api:3.1.0" 

Notice that @EnableCaching is not present either. So now all we have is the presence of two beans. Still using at least 200% CPU for no reason.

I just did so more testing. I've discovered that if I remove the declaration of the cache then the CPU goes back to normal. i.e. the second bean becomes:

@Bean
public CacheManager jCacheManager(javax.cache.CacheManager jsr107CacheManager) {
    //MutableConfiguration<Object, Object> config = new MutableConfiguration<>();
    //jsr107CacheManager.createCache("test", config);
    return new JCacheCacheManager(jsr107CacheManager);
    }

So this is something to do with the adding of a cache declaration. Is there anything wrong with the way I've done it?

ciao Derek

drekka commented 9 years ago

More news, we have just tried the code on a JBoss/Linux box and it is doing the same thing. 100% CPU as measure by Top, next to no CPU as measured by JTop. This leads me to the conclusion that something in the new EHCache code is triggering CPU usage underneath Java where Java profilers cannot see it.

lorban commented 9 years ago

I managed to reproduce the problem and indeed, there seems to be a bug in ehcache 3 that is the cause of this excessive CPU consumption.

It seems that the statistics thread pool mistakenly was created in a way that it never keeps its threads around, hence every single statistic work ehcache does internally ends up creating a new thread each time. It is that thread creation that consumes unordinate amounts of CPU time, and since those threads only end up doing very minimal work this explains why they don't show up on the jtop radar: most CPU actually is consumed by the OS to create the thread, not to execute any java code.

We're going to fix this the for the next release, but in the meantime, there fortunately is a workaround: just set the org.ehcache.statisticsExecutor.poolSize system property to 1 and the problem should go away.

Thanks for reporting this bug!

drekka commented 9 years ago

Cool. Thanks for looking into that. I'll try the workaround.

drekka commented 9 years ago

Excellent. Work around fixes the issue.

nikhil-sahu commented 6 years ago

Hi, I am using spring boot 1.5.x with jcache as caching interface. I too was facing the issue of close to 100% CPU and set the org.ehcache.statisticsExecutor.poolSize property to 1 in my application.properties. Unfortunately, the issue still persists. Now I think I am missing something here. maybe properties in the said file is not a system property. I even tried setting the environment variable but nothing. Help appreciated. Thanks

henri-tremblay commented 6 years ago

@nikhil-sahu Are you sure it is the same issue? This has been fixed for a while (3.0.0m2 if I'm not mistaken).

nikhil-sahu commented 6 years ago

@henri-tremblay I upgraded spring-boot from 1.4.0 to 1.5.8. The only major change is Ehcache 2 to 3. The CPU spiked since day 1 of deployment. Also one of the upstream servers saw a spike in requests(the response is cached). I'll try to pinpoint what exactly is causing the issue and see what comes up. Am using Ehcache 3.4.0 with jcache api 1.0.0

henri-tremblay commented 6 years ago

@nikhil-sahu It will be really nice if you can do a profiling or are able to provide a stripped down application for us to reproduce the problem. Otherwise it is pretty hard for us to help you. The cache configuration might also give us a hint. You probably should put that in another issue since I'm pretty sure it is not related to this one. Or better, on the mailing list and we will tell you if it should be turned into an issue.

nikhil-sahu commented 6 years ago

Ok sure. Thanks