OpenHFT / Chronicle-Map

Replicate your Key Value Store across your network, with consistency, persistance and performance.
http://chronicle.software/products/chronicle-map/
Apache License 2.0
2.79k stars 471 forks source link

Chronicle-Map Lock Contention #556

Open awinstan opened 2 months ago

awinstan commented 2 months ago

Hi,

We're experimenting with Chronicle-Map and we're experiencing some odd inconsistent behavior on service startup (where Chronicle-Map is instantiated). It appears that most of our "important" threads are experiencing huge contention with the SegmentHeader lock with the following stack trace extract:

"XX-thread-31" #264 daemon prio=5 os_prio=0 cpu=656390.17ms elapsed=1830.23s tid=0x0000ffee18044000 nid=0x1f9 runnable  [0x0000ffedc4842000]
   java.lang.Thread.State: RUNNABLE
    at net.openhft.chronicle.hash.impl.BigSegmentHeader.tryUpdateLockMillis(BigSegmentHeader.java:214)
    - parking to wait for  <0x0000fff526044fa8> (a java.util.concurrent.ForkJoinPool)
    at net.openhft.chronicle.hash.impl.BigSegmentHeader.tryUpdateLock0(BigSegmentHeader.java:178)
    at net.openhft.chronicle.hash.impl.BigSegmentHeader.innerTryUpdateLock(BigSegmentHeader.java:166)
    at net.openhft.chronicle.hash.impl.BigSegmentHeader.updateLock(BigSegmentHeader.java:500)
    at net.openhft.chronicle.map.impl.CompiledMapQueryContext$UpdateLock.lock(CompiledMapQueryContext.java:1122)
    at net.openhft.chronicle.map.MapMethods.put(MapMethods.java:80)
    at net.openhft.chronicle.map.VanillaChronicleMap.put(VanillaChronicleMap.java:910)
        ...

JFR traces confirm virtually all CPU time is burned in BigSegmentHeader.updateLock.

This issue is inconsistent impacting some hosts and not others - see the following CPU graph where post deployment a subset of the hosts have their CPU pegged:

image

Restarting the process temporarily resolves the issue but it comes back during our next deployment (I'm not sure what triggers this as the deploy process should be virtually identical to my manual process restarts).

Corretto Java 17 on Graviton3 instances. Chronicle-Map Version 3.26ea3

The Chronicle-Map is instantiated as follows:

ChronicleMapBuilder.of(ByteString.class, valueClass)
              .name(name)
              .entries(capacity)
              .averageKeySize(averageKeySizeBytes)
              .averageValueSize(averageValueSizeBytes)
              .keyMarshaller(new ByteStringMarshaller(name))
              .valueMarshaller(valueMarshaller)
              .createPersistedTo(persistenceFile.get())
JerryShea commented 2 months ago

Hello @awinstan - this is not something we have come across before. Investigating this would take a bit of time and we are busy supporting customers and their projects. For us to give this prioritised support can I suggest you review the offerings on https://chronicle.software/support/