eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 720 forks source link

Performance difference between 0.24.0 and 0.26.0 in containers running AcmeAirMS benchmark on Open Liberty. #13548

Closed jdmcclur closed 3 years ago

jdmcclur commented 3 years ago

I am seeing a throughput regression of around 5% when running the AcmeAir MS benchmark, which has 5 different services, in a docker environment with OpenLiberty built on top of adoptopenjdk:8-jre-openj9 (0.26.0) compared to the same version of Open Liberty running on adoptopenjdk/openjdk8-openj9:jre8u282-b08_openj9-0.24.0.

This has been tricky to debug - if I take this out of the docker env, the regression goes away or is a lot less. I hacked in healthcenter and have some hcd files, but they haven't been very enlightening.

Any advice on how to debug? Is there a newer image to try?

mpirvu commented 2 years ago

Disabling scorching or even hot compilations, does not make the problem go away. These are the methods compiled when the JVM starting to consume all the CPU:

#JITSTATE:  t=639658 JIT changed state from IDLE      to STEADY    cSmpl=  1 iSmpl=  0 comp=  0 recomp=  0, Q_SZ=  0 CLP=OFF jvmCPU=15%
 (warm) Compiling org/bson/BsonDocument.getArray(Ljava/lang/Object;Lorg/bson/BsonArray;)Lorg/bson/BsonArray;  OrdinaryMethod j9m=00000000016046B8 t=670686 compThreadID=6 memLimit=262144 KB freePhysicalMemory=297 MB
+ (warm) org/bson/BsonDocument.getArray(Ljava/lang/Object;Lorg/bson/BsonArray;)Lorg/bson/BsonArray; @ 00007FBE6F154428-00007FBE6F154A27 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=6 j9m=00000000016046B8 bcsz=19 OSR time=8356us mem=[region=2688 system=16384]KB compThreadID=6 CpuLoad=724%(90%avg) JvmCpu=117%
 (warm) Compiling java/util/Collections$SynchronizedCollection.remove(Ljava/lang/Object;)Z  OrdinaryMethod j9m=00000000000A6478 t=684816 compThreadID=6 memLimit=262144 KB freePhysicalMemory=297 MB
+ (warm) java/util/Collections$SynchronizedCollection.remove(Ljava/lang/Object;)Z @ 00007FBE6F154A80-00007FBE6F154F9B OrdinaryMethod - Q_SZ=2 Q_SZI=0 QW=14 j9m=00000000000A6478 bcsz=25 OSR time=7019us mem=[region=1536 system=16384]KB compThreadID=6 CpuLoad=711%(88%avg) JvmCpu=115%
 (warm) Compiling java/util/AbstractList.iterator()Ljava/util/Iterator;  OrdinaryMethod j9m=0000000000091D88 t=684826 compThreadID=6 memLimit=262144 KB freePhysicalMemory=297 MB
+ (warm) java/util/AbstractList.iterator()Ljava/util/Iterator; @ 00007FBE6F154FE0-00007FBE6F155146 OrdinaryMethod G Q_SZ=1 Q_SZI=0 QW=8 j9m=0000000000091D88 bcsz=10 OSR time=2721us mem=[region=2176 system=16384]KB compThreadID=6 CpuLoad=711%(88%avg) JvmCpu=115%
 (warm) Compiling java/util/AbstractList$Itr.<init>(Ljava/util/AbstractList;Ljava/util/AbstractList$1;)V  OrdinaryMethod j9m=000000000007D608 t=684826 compThreadID=6 memLimit=262144 KB freePhysicalMemory=297 MB
+ (warm) java/util/AbstractList$Itr.<init>(Ljava/util/AbstractList;Ljava/util/AbstractList$1;)V @ 00007FBE6F1551A8-00007FBE6F155299 OrdinaryMethod G Q_SZ=0 Q_SZI=0 QW=2 j9m=000000000007D608 bcsz=6 OSR time=1453us mem=[region=2048 system=16384]KB compThreadID=6 CpuLoad=711%(88%avg) JvmCpu=115%
 (warm) Compiling sun/nio/ch/SocketAdaptor.getLocalPort()I  OrdinaryMethod j9m=00000000013C4320 t=694879 compThreadID=6 memLimit=262144 KB freePhysicalMemory=297 MB
+ (warm) sun/nio/ch/SocketAdaptor.getLocalPort()I @ 00007FBE6F1552E0-00007FBE6F15571E OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=6 j9m=00000000013C4320 bcsz=22 OSR time=6262us mem=[region=1792 system=16384]KB compThreadID=6 CpuLoad=735%(91%avg) JvmCpu=110%
 (warm) Compiling java/lang/InterruptedException.<init>(Ljava/lang/String;)V  OrdinaryMethod j9m=00000000001A8B50 t=696817 compThreadID=6 memLimit=262144 KB freePhysicalMemory=297 MB
+ (warm) java/lang/InterruptedException.<init>(Ljava/lang/String;)V @ 00007FBE6F155760-00007FBE6F155861 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=2 j9m=00000000001A8B50 bcsz=6 OSR time=4209us mem=[region=2176 system=16384]KB compThreadID=6 CpuLoad=708%(88%avg) JvmCpu=114%
 (warm) Compiling java/util/Collections$UnmodifiableCollection.iterator()Ljava/util/Iterator;  OrdinaryMethod j9m=0000000000097238 t=700692 compThreadID=6 memLimit=262144 KB freePhysicalMemory=297 MB
+ (warm) java/util/Collections$UnmodifiableCollection.iterator()Ljava/util/Iterator; @ 00007FBE6F1558C0-00007FBE6F155ABF OrdinaryMethod G Q_SZ=0 Q_SZI=0 QW=6 j9m=0000000000097238 bcsz=9 OSR time=2237us mem=[region=2176 system=16384]KB compThreadID=6 CpuLoad=717%(89%avg) JvmCpu=116%
 (warm) Compiling com/ibm/ws/http/channel/internal/HttpBaseMessageImpl.destroy()V  OrdinaryMethod j9m=00000000014A3E20 t=705485 compThreadID=6 memLimit=262144 KB freePhysicalMemory=297 MB
+ (warm) com/ibm/ws/http/channel/internal/HttpBaseMessageImpl.destroy()V @ 00007FBE6F155B00-00007FBE6F156158 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=6 j9m=00000000014A3E20 bcsz=83 OSR time=12119us mem=[region=2112 system=16384]KB compThreadID=6 CpuLoad=716%(89%avg) JvmCpu=113%
#INFO:  t=765178 GCR disabled; GCR queued=0
#JITSTATE:  t=765178 JIT changed state from STEADY    to DEEPSTEADY cSmpl=1079 iSmpl=  0 comp=  0 recomp=  0, Q_SZ=  0 CLP=OFF jvmCPU=791%
mpirvu commented 2 years ago

I run the flight service with just cold compilations. Just before my third batch of JMeter load finished, it started to show errors. The flight service was active another 20-30 seconds after JMeter finished, consuming 150-200% CPU. After that everything went back to normal and I started the 4-th JMeter batch. Midway through the 10 minutes run, the booking service started to misbehave, being the only that consumed ~200% CPU. As usual, JMeter started to show errors.

In another run, where both the flight service and the booking service were using only cold compilations, the flight service was again "hung" consuming 800% of CPU. The methods compiled during that period when the problem appeared are:

 (cold) Compiling java/nio/channels/SelectionKey.attachment()Ljava/lang/Object;  OrdinaryMethod j9m=00000000014490C8 t=887633 compThreadID=0 memLimit=262144 KB freePhysicalMemory=303 MB
+ (cold) java/nio/channels/SelectionKey.attachment()Ljava/lang/Object; @ 00007FA30048A2C0-00007FA30048A2F0 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=2 j9m=00000000014490C8 bcsz=5 time=2923us mem=[region=832 system=16384]KB compThreadID=0 CpuLoad=699%(87%avg) JvmCpu=169%
 (cold) Compiling java/text/FieldPosition.<init>(I)V  OrdinaryMethod j9m=00000000001EF200 t=900744 compThreadID=0 memLimit=262144 KB freePhysicalMemory=303 MB
+ (cold) java/text/FieldPosition.<init>(I)V @ 00007FA30048A340-00007FA30048A3C8 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=2 j9m=00000000001EF200 bcsz=25 time=1007us mem=[region=1088 system=16384]KB compThreadID=0 CpuLoad=700%(87%avg) JvmCpu=160%
 (cold) Compiling com/ibm/wsspi/channelfw/objectpool/LocalThreadObjectPool.<init>(ILcom/ibm/wsspi/channelfw/objectpool/ObjectFactory;Lcom/ibm/wsspi/channelfw/objectpool/ObjectDestroyer;)V  OrdinaryMethod j9m=0000000001488410 t=901199 compThreadID=0 memLimit=262144 KB freePhysicalMemory=303 MB
+ (cold) com/ibm/wsspi/channelfw/objectpool/LocalThreadObjectPool.<init>(ILcom/ibm/wsspi/channelfw/objectpool/ObjectFactory;Lcom/ibm/wsspi/channelfw/objectpool/ObjectDestroyer;)V @ 00007FA30048A428-00007FA30048AB66 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=2 j9m=0000000001488410 bcsz=320 time=2743us mem=[region=1408 system=16384]KB compThreadID=0 CpuLoad=712%(89%avg) JvmCpu=171%
 (cold) Compiling java/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V  OrdinaryMethod j9m=000000000036D680 t=910648 compThreadID=0 memLimit=262144 KB freePhysicalMemory=303 MB
+ (cold) java/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V @ 00007FA30048ABC0-00007FA30048AE3E OrdinaryMethod - Q_SZ=1 Q_SZI=1 QW=4 j9m=000000000036D680 bcsz=59 time=1815us mem=[region=1280 system=16384]KB compThreadID=0 CpuLoad=776%(97%avg) JvmCpu=580%
 (cold) Compiling java/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask.isPeriodic()Z  OrdinaryMethod j9m=000000000036D620 t=910648 compThreadID=0 memLimit=262144 KB freePhysicalMemory=303 MB
+ (cold) java/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask.isPeriodic()Z @ 00007FA30048AE80-00007FA30048AEC5 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=2 j9m=000000000036D620 bcsz=15 time=291us mem=[region=960 system=16384]KB compThreadID=0 CpuLoad=776%(97%avg) JvmCpu=580%
#JITSTATE:  t=921150 JIT changed state from STEADY    to DEEPSTEADY cSmpl=1138 iSmpl=  0 comp=  0 recomp=  0, Q_SZ=  0 CLP=OFF jvmCPU=794%

I don't think it's one these at fault though.

mpirvu commented 2 years ago

I profiled the flight service when it consumes the CPU on the entire machine. The only methods that are executed are these two:

  96.44%      18076720  [.] java/util/HashMap$TreeNode.putTreeVal(Ljava/util/HashMap;[Ljava/util/HashMap$Node;ILjava/lang/Object;Ljava/lang/Object;)Ljava/util/HashMap$TreeNode;_cold
   3.38%        632790  [.] java/util/HashMap$TreeNode.find(ILjava/lang/Object;Ljava/lang/Class;)Ljava/util/HashMap$TreeNode;_cold

putTreeVal calls find. The problem could be in putTreeVal, but it's also possible that the tree is malformed such that it creates an infinite loop.

jdmcclur commented 2 years ago

Interesting, so the original problem seems to go away with the new jmeter script which is good, but there are new problems.

Here is where the Liberty code does the put in the stack above. https://github.com/OpenLiberty/open-liberty/blob/5c946ad0ee55c8adfbf54cb56bfa1558e6c28a96/dev/com.ibm.ws.jaxrs.2.0.cdi/src/com/ibm/ws/jaxrs20/cdi/component/JaxRsFactoryImplicitBeanCDICustomizer.java#L245

mpirvu commented 2 years ago

The most conservative setup that still reproduces this livelock scenario is:

If I use noOpt level for the flight service, then I cannot reproduce the livelock, but JMeter still throws errors after a while (the system is mostly idle and the throughput drops to 0.1). The same behavior can be obtained by using cold optLevel but disabling the compilation of *TreeNode.putTreeVal*. For what is worth, I am attaching the compilation log for java/util/HashMap$TreeNode.putTreeVal when compiled at cold with disable inlining: log.txt.4.1.87859.20211108.161059.1.zip

I have also tried switching to HotSpot for booking, customer, flight and auth services. Livelock is not observed with hotspot, but performance is not stable in the sense that, after 5-10 minutes the performance drops from ~3400 to about 800 and stays there for a long time, after which it may go up to 1700 and then go down again. If I interrupt JMeter for a while and start it again, the system starts at good level of throughput, but drops again after a while. Yet, in another experiment I used HotSpot for booking, customer and auth services, and OpenJ9 for the flight service. The livelock condition did not manifest, so it's unclear whether the compilation of java/util/HashMap$TreeNode.putTreeVal on the flight service is the one that is faulty.

vijaysun-omr commented 2 years ago

I looked at the attached log and the only non-trivial optimization done seems to be from local CSE. Can you please try adding disableLocalCSE to the (most conservative) options for the flight service and see if the livelock is avoided at least ?

mpirvu commented 2 years ago

I reproduced the issue with OpenJ9 for the flight service and HotSpot for auth/booking/customer services. The flight service had the following options: "-Xjit:verbose={compilePerformance},vlog=/tmp/vlog.txt,initialOptLevel=cold,inhibitRecompilation,disableInlining,{*TreeNode.putTreeVal*}(disableLocalCSE,traceFull,log=/tmp/log.txt)" Here's the compilation log: log.txt.6.1.71914.20211109.153154.1.zip

I had to wait for 1.5 hours of load to reproduce. The culprit method was compiled about 20 minutes into the run. It's strange that the error shows such a long time after the compilation of the method.

vijaysun-omr commented 2 years ago

How about lastOptIndex=0 for that method in question while still keeping the opt level at cold, i.e. the rest of your command line unchanged ?

mpirvu commented 2 years ago

I managed to reproduce the problem even at noOpt. Options used: -Xjit:verbose={compilePerformance},vlog=/tmp/vlog.txt,initialOptLevel=noOpt,inhibitRecompilation,disableInlining,{*TreeNode.putTreeVal*}(traceFull,log=/tmp/log.txt) Compilation log: log.txt.3.1.82666.20211109.183106.1.zip

vijaysun-omr commented 2 years ago

And you are 100% sure that not compiling putTreeVal will not have this issue ?

mpirvu commented 2 years ago

Ran again, excluding java/util/HashMap$TreeNode.putTreeVal and the problem still reproduces. Methods compiled around the time the CPU jumps to 800%:

 (no-opt) Compiling java/util/HashMap$TreeNode.find(ILjava/lang/Object;Ljava/lang/Class;)Ljava/util/HashMap$TreeNode;  OrdinaryMethod j9m=0000000000093C80 t=1521472 compThreadID=4 memLimit=262144 KB freePhysicalMemory=829 MB
+ (no-opt) java/util/HashMap$TreeNode.find(ILjava/lang/Object;Ljava/lang/Class;)Ljava/util/HashMap$TreeNode; @ 00007FE12CC88368-00007FE12CC885A8 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000093C80 bcsz=172 time=1256us mem=[region=1088 system=16384]KB compThreadID=4 CpuLoad=682%(85%avg) JvmCpu=207%
 (no-opt) Compiling java/util/HashMap$TreeNode.removeTreeNode(Ljava/util/HashMap;[Ljava/util/HashMap$Node;Z)V  OrdinaryMethod j9m=0000000000093D40 t=1524511 compThreadID=4 memLimit=262144 KB freePhysicalMemory=829 MB
+ (no-opt) java/util/HashMap$TreeNode.removeTreeNode(Ljava/util/HashMap;[Ljava/util/HashMap$Node;Z)V @ 00007FE12CC88608-00007FE12CC89CE6 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000093D40 bcsz=632 time=7946us mem=[region=1984 system=16384]KB compThreadID=4 CpuLoad=693%(86%avg) JvmCpu=205%
 (no-opt) Compiling java/util/HashMap$TreeNode.getTreeNode(ILjava/lang/Object;)Ljava/util/HashMap$TreeNode;  OrdinaryMethod j9m=0000000000093CA0 t=1531473 compThreadID=4 memLimit=262144 KB freePhysicalMemory=829 MB
+ (no-opt) java/util/HashMap$TreeNode.getTreeNode(ILjava/lang/Object;)Ljava/util/HashMap$TreeNode; @ 00007FE12CC89D48-00007FE12CC89DE6 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000093CA0 bcsz=22 time=579us mem=[region=768 system=16384]KB compThreadID=4 CpuLoad=669%(83%avg) JvmCpu=203%
 (no-opt) Compiling java/util/HashMap$TreeNode.moveRootToFront([Ljava/util/HashMap$Node;Ljava/util/HashMap$TreeNode;)V  OrdinaryMethod j9m=0000000000093C60 t=1532563 compThreadID=4 memLimit=262144 KB freePhysicalMemory=829 MB
+ (no-opt) java/util/HashMap$TreeNode.moveRootToFront([Ljava/util/HashMap$Node;Ljava/util/HashMap$TreeNode;)V @ 00007FE12CC89E40-00007FE12CC8A379 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000093C60 bcsz=124 time=1761us mem=[region=1216 system=16384]KB compThreadID=4 CpuLoad=698%(87%avg) JvmCpu=212%
 (no-opt) Compiling java/util/HashMap$TreeNode.<init>(ILjava/lang/Object;Ljava/lang/Object;Ljava/util/HashMap$Node;)V  OrdinaryMethod j9m=0000000000093C20 t=1537495 compThreadID=4 memLimit=262144 KB freePhysicalMemory=827 MB
+ (no-opt) java/util/HashMap$TreeNode.<init>(ILjava/lang/Object;Ljava/lang/Object;Ljava/util/HashMap$Node;)V @ 00007FE12CC8A3C8-00007FE12CC8A417 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000093C20 bcsz=10 time=447us mem=[region=768 system=16384]KB compThreadID=4 CpuLoad=698%(87%avg) JvmCpu=224%
 (no-opt) Compiling java/util/concurrent/atomic/AtomicLong.addAndGet(J)J  OrdinaryMethod j9m=00000000000BFD40 t=1537540 compThreadID=4 memLimit=262144 KB freePhysicalMemory=827 MB
+ (no-opt) java/util/concurrent/atomic/AtomicLong.addAndGet(J)J @ 00007FE12CC8A460-00007FE12CC8A4AE OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=00000000000BFD40 bcsz=14 time=388us mem=[region=768 system=16384]KB compThreadID=4 CpuLoad=698%(87%avg) JvmCpu=224%
 (no-opt) Compiling java/util/HashMap.newTreeNode(ILjava/lang/Object;Ljava/lang/Object;Ljava/util/HashMap$Node;)Ljava/util/HashMap$TreeNode;  OrdinaryMethod j9m=0000000000055E98 t=1549531 compThreadID=4 memLimit=262144 KB freePhysicalMemory=826 MB
+ (no-opt) java/util/HashMap.newTreeNode(ILjava/lang/Object;Ljava/lang/Object;Ljava/util/HashMap$Node;)Ljava/util/HashMap$TreeNode; @ 00007FE12CC8A508-00007FE12CC8A5E4 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000055E98 bcsz=13 time=709us mem=[region=768 system=16384]KB compThreadID=4 CpuLoad=684%(85%avg) JvmCpu=204%
 (no-opt) Compiling java/lang/Thread$ThreadLock.<init>(Ljava/lang/Thread$1;)V  OrdinaryMethod j9m=000000000009C6D8 t=1570539 compThreadID=4 memLimit=262144 KB freePhysicalMemory=826 MB
+ (no-opt) java/lang/Thread$ThreadLock.<init>(Ljava/lang/Thread$1;)V @ 00007FE12CC8A640-00007FE12CC8A676 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=000000000009C6D8 bcsz=5 time=406us mem=[region=640 system=16384]KB compThreadID=4 CpuLoad=276%(34%avg) JvmCpu=56%
 (no-opt) Compiling java/lang/Math.sqrt(D)D  OrdinaryMethod j9m=000000000005EEB0 t=1593042 compThreadID=4 memLimit=262144 KB freePhysicalMemory=826 MB
+ (no-opt) java/lang/Math.sqrt(D)D @ 00007FE12CC8A6C0-00007FE12CC8A6F4 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=000000000005EEB0 bcsz=5 time=380us mem=[region=640 system=16384]KB compThreadID=4 CpuLoad=309%(38%avg) JvmCpu=71%
 (no-opt) Compiling com/ibm/ws/cache/BatchUpdateDaemon.wakeUp(JJ)V  OrdinaryMethod j9m=00000000008AF2B8 t=1595186 compThreadID=4 memLimit=262144 KB freePhysicalMemory=826 MB
+ (no-opt) com/ibm/ws/cache/BatchUpdateDaemon.wakeUp(JJ)V @ 00007FE12CC8A748-00007FE12CC8E2A3 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=00000000008AF2B8 bcsz=1154 time=12128us mem=[region=3136 system=16384]KB compThreadID=4 CpuLoad=291%(36%avg) JvmCpu=63%
 (no-opt) Compiling com/ibm/ws/webcontainer/util/UnsynchronizedStack.<init>()V  OrdinaryMethod j9m=00000000014E5E10 t=1599039 compThreadID=4 memLimit=262144 KB freePhysicalMemory=826 MB
+ (no-opt) com/ibm/ws/webcontainer/util/UnsynchronizedStack.<init>()V @ 00007FE12CC8E300-00007FE12CC8E32A OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=00000000014E5E10 bcsz=5 time=2145us mem=[region=640 system=16384]KB compThreadID=4 CpuLoad=298%(37%avg) JvmCpu=70%
 (no-opt) Compiling java/util/TreeMap.keyOrNull(Ljava/util/TreeMap$Entry;)Ljava/lang/Object;  OrdinaryMethod j9m=00000000001AE198 t=1623042 compThreadID=4 memLimit=262144 KB freePhysicalMemory=826 MB
+ (no-opt) java/util/TreeMap.keyOrNull(Ljava/util/TreeMap$Entry;)Ljava/lang/Object; @ 00007FE12CC8E380-00007FE12CC8E3D3 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=00000000001AE198 bcsz=13 time=509us mem=[region=960 system=16384]KB compThreadID=4 CpuLoad=284%(35%avg) JvmCpu=62%
 (no-opt) Compiling java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.unlinkCancelledWaiters()V  OrdinaryMethod j9m=00000000002A7CA0 t=1800867 compThreadID=4 memLimit=262144 KB freePhysicalMemory=826 MB
+ (no-opt) java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.unlinkCancelledWaiters()V @ 00007FE12CC8E420-00007FE12CC8E6FD OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=00000000002A7CA0 bcsz=67 time=972us mem=[region=1088 system=16384]KB compThreadID=4 CpuLoad=799%(99%avg) JvmCpu=794%
 (no-opt) Compiling com/ibm/websphere/ras/TraceComponent.isEventEnabled()Z  OrdinaryMethod j9m=0000000000194238 t=1897576 compThreadID=4 memLimit=262144 KB freePhysicalMemory=826 MB
+ (no-opt) com/ibm/websphere/ras/TraceComponent.isEventEnabled()Z @ 00007FE12CC8E740-00007FE12CC8E78F OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000194238 bcsz=23 time=437us mem=[region=960 system=16384]KB compThreadID=4 CpuLoad=801%(100%avg) JvmCpu=800%

I am going to exclude all TreeNode methods next

vijaysun-omr commented 2 years ago

java/util/concurrent/atomic/AtomicLong.addAndGet is the sort of method that could lead to a livelock in theory I believe, since the method will keep attempting to do the update as long as it takes (inside a loop presumably) until it successfully updates the value atomically. i.e. if there is either a bug or some sort of race condition where the lock cmpxchg keeps failing then it has no option but to keep trying in a tight loop that could burn CPU without advancing throughput.

Could you please try a run without compiling that method ? We may even be interested in disabling the inlining of that method to be even more paranoid.

mpirvu commented 2 years ago

The experiment above was done with initialOptLevel=noOpt (disableInlining also remained on the command line). I'll run without compiling that method.

mpirvu commented 2 years ago

The livelock happened even after excluding all {*TreeNode.*} methods.

The real kicker is that after excluding {java/util/concurrent/atomic*} I still see a livelock, but on the customer service which is running a HotSpot JVM.

vijaysun-omr commented 2 years ago

Sorry are you saying that after excluding {java/util/concurrent/atomic*} the flight service no longer showed a livelock ? Are you planning to specifically try excluding java/util/concurrent/atomic/AtomicLong.addAndGet as well since that was the method seen before the livelock happened ? If this method is at issue, it would also be good to check which atomic long we are updating in the program (perhaps via inspecting the stack using breakOnEntry), e.g. maybe we get into some state where exceptions are being thrown on all threads and there is a lot of contention for updating a particular atomic long that keeps track of number of failed transactions, i.e. perhaps it is a manifestation still of the original problem being some exceptions occurring.

Given past results, I should also ask if the behavior that you observed in this regard was consistent or not.

Maybe we should get a javacore when the livelock happens on all the services and check how many threads are active (since the Liberty thread pool can change the number of threads, it may be good to check on whether the thread count is unexpectedly high for example).

How many CPUs are in use in these runs for the different services respectively ?

mpirvu commented 2 years ago

I excluded just java/util/concurrent/atomic/AtomicLong.addAndGet and I can still reproduce the issue:

CONTAINER ID        NAME                           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
cae418e2b265        acmeair-nginx1                 0.00%               71.88MiB / 15.58GiB   0.45%               10.7GB / 11GB       11.3MB / 8.19kB     9
0eff6ebc3498        acmeair-bookingservice-java    0.11%               588.4MiB / 1GiB       57.46%              3.12GB / 2.58GB     8.75MB / 2.75MB     60
ab0a31a43975        acmeair-flightservice-java     797.43%             192.9MiB / 1GiB       18.84%              3.84GB / 2.55GB     32.2MB / 2.24MB     67
d60dbd3c5c17        acmeair-customerservice-java   0.13%               585.3MiB / 1GiB       57.16%              1.7GB / 1.41GB      21.5MB / 2.79MB     57
7dbbefb35d4d        acmeair-mainservice-java       0.47%               142.5MiB / 1GiB       13.92%              6.47kB / 244B       110MB / 1.14MB      63
213d8bf5d84a        acmeair-authservice-java       0.11%               400MiB / 1GiB         39.06%              179MB / 206MB       85MB / 2.63MB       53
09d3ba21ad74        acmeair-booking-db             0.17%               89MiB / 15.58GiB      0.56%               596MB / 980MB       1.79MB / 310MB      44
697018bc7472        acmeair-flight-db              0.19%               67.55MiB / 15.58GiB   0.42%               1.94MB / 1.22MB     30.5MB / 15.7MB     42
87bb25c74417        acmeair-customer-db            0.19%               82.77MiB / 15.58GiB   0.52%               970MB / 890MB       44.1MB / 325MB      55

There are 16 Executor Threads and all are burning CPU with the same stack trace:

3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/util/HashMap$TreeNode.putTreeVal(HashMap.java:2041(Compiled Code))
4XESTACKTRACE                at java/util/HashMap.putVal(HashMap.java:639(Compiled Code))
4XESTACKTRACE                at java/util/HashMap.put(HashMap.java:613(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/cdi/component/JaxRsFactoryImplicitBeanCDICustomizer.getClassFromCDI(JaxRsFactoryImplicitBeanCDICustomizer.java:245(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/cdi/component/JaxRsFactoryImplicitBeanCDICustomizer.getInstanceFromManagedObject(JaxRsFactoryImplicitBeanCDICustomizer.java:257(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/cdi/component/JaxRsFactoryImplicitBeanCDICustomizer.beforeServiceInvoke(JaxRsFactoryImplicitBeanCDICustomizer.java:219(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/server/LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:212(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/jaxrs/JAXRSInvoker.invoke(JAXRSInvoker.java:213(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/server/LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:444(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/jaxrs/JAXRSInvoker.invoke(JAXRSInvoker.java:112(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/interceptor/ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/interceptor/ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/phase/PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/transport/ChainInitiationObserver.onMessage(ChainInitiationObserver.java:123(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/transport/http/AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:277(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/endpoint/AbstractJaxRsWebEndpoint.invoke(AbstractJaxRsWebEndpoint.java:137(Compiled Code))
4XESTACKTRACE                at com/ibm/websphere/jaxrs/server/IBMRestServlet.handleRequest(IBMRestServlet.java:146(Compiled Code))
4XESTACKTRACE                at com/ibm/websphere/jaxrs/server/IBMRestServlet.doPost(IBMRestServlet.java:104(Compiled Code))
4XESTACKTRACE                at javax/servlet/http/HttpServlet.service(HttpServlet.java:706(Compiled Code))
4XESTACKTRACE                at com/ibm/websphere/jaxrs/server/IBMRestServlet.service(IBMRestServlet.java:96(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/servlet/ServletWrapper.service(ServletWrapper.java:1258(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest(ServletWrapper.java:746(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest(ServletWrapper.java:443(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/filter/WebAppFilterChain.invokeTarget(WebAppFilterChain.java:183(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/filter/WebAppFilterChain.doFilter(WebAppFilterChain.java:94(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/security/jaspi/JaspiServletFilter.doFilter(JaspiServletFilter.java:56(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/filter/FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:201(Compiled Code))
...

I use my entire machine (4 cores/ 8 HW threads) for these experiments and there are no CPU limits set for docker containers.

Given that I saw a livelock scenario for HotSpot as well (on the customer service) I doubt this is an issue stemming from OpenJ9 JIT compilation.

vijaysun-omr commented 2 years ago

I agree with you that this is unlikely to be an OpenJ9 issue but I am wondering if there is still an indication of a Liberty issue.

The thread/CPU counts you mentioned do not seem all that extreme. The stack trace that you showed also does not seem exceptional in any way, so I am wondering why this code path consumes so much CPU after N minutes whereas that was not the case previously. @jdmcclur any thoughts on that code path and maybe we can instrument it in some way ?

jdmcclur commented 2 years ago

From what I can tell, it is populating a Hashmap with CreationalContexts on every request that get released/removed later. Maybe the Hashmap is getting too big?

//The key is Object to match afterServiceInvoke.
private final Map<Object, CreationalContext<?>> creationalContextsToRelease = new HashMap<>();
private Object getClassFromCDI(Class<?> clazz) {
        BeanManager manager = getBeanManager();
        Bean<?> bean = getBeanFromCDI(clazz);
        Object obj = null;
        if (bean != null && manager != null) {
            CreationalContext<?> cc = manager.createCreationalContext(bean);
            obj = manager.getReference(bean, clazz, cc);
            if (cc != null && obj != null) {
                creationalContextsToRelease.put(obj, cc);
            }
        }
        return obj;
    }
    @Override
    public void afterServiceInvoke(Object serviceObject, boolean isSingleton, Object context) {
        if (creationalContextsToRelease.containsKey(serviceObject)) {
            CreationalContext<?> cc = creationalContextsToRelease.remove(serviceObject);
            if (cc != null)
                cc.release();
        }
mpirvu commented 2 years ago

It's possible that the hashtable gets big because the issue happens after some time. With the latest configuration I have to wait for more than an hour to reproduce it. However, based on the memory used by the container (193 MiB), I wouldn't say it's something we shouldn't be able to manage.

mpirvu commented 2 years ago

I don't see the livelock problem anymore with the new com.ibm.ws.jaxrs.2.0.cdi_1.0.57.jar provided by @jdmcclur