ehcache / ehcache3

Ehcache 3.x line
http://www.ehcache.org
Apache License 2.0
2.02k stars 581 forks source link

EHCache stalled during statistics accumulation #3126

Closed nikowitt closed 1 year ago

nikowitt commented 1 year ago

Hi there, we just encountered the case that an application stalled using EHcache and LOTS of thread (50-100 user threads) stuck at

at java.util.concurrent.atomic.LongAccumulator.accumulate(java.base@11.0.16/LongAccumulator.java:107)
    at org.ehcache.shadow.org.terracotta.statistics.derived.latency.LatencyAccumulator.accumulate(LatencyAccumulator.java:51)
    at org.ehcache.shadow.org.terracotta.statistics.derived.latency.Jsr107LatencyMonitor.lambda$new$0(Jsr107LatencyMonitor.java:34)
    at org.ehcache.shadow.org.terracotta.statistics.derived.latency.Jsr107LatencyMonitor$$Lambda$4778/0x0000000802ae8840.event(Unknown Source)
    at org.ehcache.shadow.org.terracotta.statistics.derived.OperationResultSampler.end(OperationResultSampler.java:42)
    at org.ehcache.shadow.org.terracotta.statistics.derived.latency.Jsr107LatencyMonitor.end(Jsr107LatencyMonitor.java:44)
    at org.ehcache.jsr107.internal.Jsr107LatencyMonitor.end(Jsr107LatencyMonitor.java:45)
    at org.ehcache.core.internal.statistics.DelegatingOperationStatistic$2.end(DelegatingOperationStatistic.java:102)
    at org.ehcache.shadow.org.terracotta.statistics.AbstractOperationStatistic.end(AbstractOperationStatistic.java:82)
    at org.ehcache.shadow.org.terracotta.statistics.GeneralOperationStatistic.end(GeneralOperationStatistic.java:73)
    at org.ehcache.core.EhcacheBase.put(EhcacheBase.java:190)
    at org.ehcache.jsr107.Eh107Cache.put(Eh107Cache.java:175)
    at org.hibernate.cache.jcache.internal.JCacheAccessImpl.putIntoCache(JCacheAccessImpl.java:43)
    at org.hibernate.cache.spi.support.DomainDataStorageAccess.putFromLoad(DomainDataStorageAccess.java:25)
    at org.hibernate.cache.spi.support.AbstractCachedDomainDataAccess.putFromLoad(AbstractCachedDomainDataAccess.java:62)
    at org.hibernate.cache.spi.support.AbstractCachedDomainDataAccess.putFromLoad(AbstractCachedDomainDataAccess.java:73)
    at org.hibernate.engine.internal.TwoPhaseLoad.initializeEntityFromEntityEntryLoadedState(TwoPhaseLoad.java:290)
    at org.hibernate.engine.internal.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:156)
    at org.hibernate.engine.internal.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:126)
    at org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:1193)
    at org.hibernate.loader.Loader.processResultSet(Loader.java:1001)

Just from this trace, is there a good guess where I could have a look at? If not, which information is important? It is a cache that is not using offheap, but is stored only in memory. The heap dump shows that the sum of all on-heap caches (~350) is not bigger than ~80-100 MB in memory.

Thanks in advance!

chrisdennis commented 1 year ago

That is very odd looking. I would be doubtful that this is truly "stalled", and maybe speculate that this is really just a very hot path? I would probably start with trying to get a series of thread dumps from the machine to see if you can catch threads making progress, or whether things are truly stalled. If you can catch a thread moving then I would immediately shift my focus to what Hibernate is doing and whether this is just a heavily loaded system (maybe with some poor caching behaviors).

nikowitt commented 1 year ago

At the time where it happened, there was no significant load, compared to other usage. As I also was able to take a heap dump at this, I will also further investigate. So this means that this stacktrace does not ring any bells on your end?

Also checking the heap dump, I do not find any unusual load.

nikowitt commented 1 year ago

Closing for now as I don't find anything suspicous, will reopen if I can add more details that could point to an EHCache issue.