eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 720 forks source link

Performance difference between 0.24.0 and 0.26.0 in containers running AcmeAirMS benchmark on Open Liberty. #13548

Closed jdmcclur closed 3 years ago

jdmcclur commented 3 years ago

I am seeing a throughput regression of around 5% when running the AcmeAir MS benchmark, which has 5 different services, in a docker environment with OpenLiberty built on top of adoptopenjdk:8-jre-openj9 (0.26.0) compared to the same version of Open Liberty running on adoptopenjdk/openjdk8-openj9:jre8u282-b08_openj9-0.24.0.

This has been tricky to debug - if I take this out of the docker env, the regression goes away or is a lot less. I hacked in healthcenter and have some hcd files, but they haven't been very enlightening.

Any advice on how to debug? Is there a newer image to try?

pshipton commented 3 years ago

@mpirvu @vijaysun-omr fyi

0.27 is released, for now you can get docker images as described in https://github.com/eclipse-openj9/openj9/issues/13534#issuecomment-924085997

mpirvu commented 3 years ago

@jdmcclur Did you build the Liberty containers yourself, or did you use Liberty images published on Docker Hub? At some point OpenJ9 started to embed a shared class cache into the containers and Liberty images create another SCC layer on top. I don't remember when exactly these changes took place (they happened gradually), but I can check if I know exactly which images you are using.

jdmcclur commented 3 years ago

@mpirvu - I built the liberty images myself, but did it the same way as the official builds so it should be the same. Both builds are using layers on top of the SCC provided by java, but the last layer (the application layer) is bigger in the 0.24.0 which is interesting.

0.26.0
-rwxrwxrwx 1 root    root 13631488 Aug 31 02:29 C290M4F1A64P_openj9_system_scc_G41L00
-rw-rw-r-- 1 root    root 40894464 Sep 10 19:49 C290M4F1A64P_openj9_system_scc_G41L01
-rw-rw-r-- 1 default root 31457280 Sep 23 18:18 C290M4F1A64P_openj9_system_scc_G41L02

0.24.0
-rwxrwxrwx 1 root    root 13631488 Apr  3 16:20 C290M4F1A64P_openj9_system_scc_G41L00
-rw-rw-r-- 1 root    root 40894464 Sep 10 19:53 C290M4F1A64P_openj9_system_scc_G41L01
-rw-rw-r-- 1 default root 32505856 Sep 23 18:16 C290M4F1A64P_openj9_system_scc_G41L02
vijaysun-omr commented 3 years ago

Is there a difference between these configs if you run with -Xshareclasses:none ?

jdmcclur commented 3 years ago

Interesting, without using a SCC, the regression almost goes away.

SCC Tput  
0.24.0 6413.33  
0.26.0 5849.67 91.2%
     
No SCC Tput   
0.24.0 6655.00  
0.26.0 6597.67 99.1%

mpirvu commented 3 years ago

The regression could be related to the portable AOT feature. In docker we started to generate more generic AOT code (but also less optimized) so that the image can be used on more CPUs and more heap configurations. @jdmcclur maybe you do an experiment outside docker, with 0.26, comparing one run with -XX:+PortableSharedCache to another run with -XX:-PortableSharedCache.

I don't have a reliable AcmeAir MS installation around. Do you know if the regression is seen with AcmeAir mono as well?

jdmcclur commented 3 years ago

@mpirvu - There is a bit a difference outside of docker with the different options. (3 runs each, all about the same). I will note that outside of docker I am not using the java scc layers, it is just one SCC generated the first time the server starts.

-XX:+PortableSharedCache 7729.33  
-XX:-PortableSharedCache 7854.33 101.6%

I'll look into what acmeair monolithic looks like later today.

jdmcclur commented 3 years ago

I don't see any regression with Acmeair Monolithic.

vijaysun-omr commented 3 years ago

Maybe it is worth doing a run inside container with -Xnoaot with the SCC still enabled with both builds to see how that affects the delta that is observed.

jdmcclur commented 3 years ago

No delta with -Xnoaot

No AOT    
0.24.0 6700.33  
0.26.0 6694.00 99.9%
vijaysun-omr commented 3 years ago

Okay, I wonder if it makes sense to dig into the difference that you had mentioned at the application layer for the SCC. i.e. what are the contents of the SCC and/or get a -Xjit:verbose log to see how many compilations (JIT or AOT) we get with either build to see if the difference in that regard is an unexpected one. @mpirvu any other ideas ?

jdmcclur commented 3 years ago

jitlogs attached

jitlogs.zip

mpirvu commented 3 years ago

The verbose logs above were collected with with just -Xjit:verbose (as opposed to -Xjit:verbose={compilePerformance}) so they are missing some information. However, I was able to determine a possible culprit: in 0.26.0 vlogs for Booking and for Customer I see many failures like this:

+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007F3418C708E0-00007F3418C70C56 OrdinaryMethod - Q_SZ=1136 Q_SZI=1136 QW=6484 j9m=0000000000CB8CF0 bcsz=67 OSR compThreadID=2 CpuLoad=775%(96%avg) JvmCpu=428%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=2065611us compilationHeapLimitExceeded memLimit=262144 KB freePhysicalMemory=249895 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007F3418736700-00007F3418736A56 OrdinaryMethod - Q_SZ=6 Q_SZI=6 QW=72 j9m=0000000000CB8CF0 bcsz=67 OSR compThreadID=5 CpuLoad=220%(27%avg) JvmCpu=137%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=2142802us compilationHeapLimitExceeded memLimit=262144 KB freePhysicalMemory=249802 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007F3418E57760-00007F3418E57AD6 OrdinaryMethod - Q_SZ=2 Q_SZI=0 QW=42 j9m=0000000000CB8CF0 bcsz=67 OSR compThreadID=3 CpuLoad=519%(64%avg) JvmCpu=328%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=2023384us compilationHeapLimitExceeded memLimit=262144 KB freePhysicalMemory=249902 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007F3418E6FFC0-00007F3418E70336 OrdinaryMethod - Q_SZ=3 Q_SZI=0 QW=48 j9m=0000000000CB8CF0 bcsz=67 OSR compThreadID=3 CpuLoad=103%(12%avg) JvmCpu=100%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=5640679us compilationHeapLimitExceeded memLimit=262144 KB freePhysicalMemory=249605 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007F3418E962C0-00007F3418E96636 OrdinaryMethod - Q_SZ=6 Q_SZI=5 QW=62 j9m=0000000000CB8CF0 bcsz=67 OSR compThreadID=0 CpuLoad=762%(95%avg) JvmCpu=168%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=3935845us compilationHeapLimitExceeded memLimit=262144 KB freePhysicalMemory=249651 MB
...

The JIT compiles com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter at warm and then at hot (I cannot tell why based on the provided information). The compilation at hot fails due to insufficient memory. The JIT retries the compilation again at warm. Looking at the code, we retry a re-compilation that failed if the method body used pre-existence. That recompilation at warm succeeds. What I don't understand is why the JIT attempts to recompile the method again at hot. The code disables sampling, so it's not the sampling mechanism that tries to upgrade the method.

mpirvu commented 3 years ago

If my theory is correct, then -Xjit:{com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter*}(disableInvariantArgumentPreexistence) should fix the issue. I don't understand the connection to AOT though. In 0.24.0 this method is successfully compiled at hot, so we don't have this nasty compile-fail-compile-fail behavior.

jdmcclur commented 3 years ago

@mpirvu Yes, looks like adding -Xjit:{com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter*}(disableInvariantArgumentPreexistence) fixed the regression. Would running with -Xjit:verbose={compilePerformance} be helpful?

mpirvu commented 3 years ago

Would running with -Xjit:verbose={compilePerformance} be helpful?

Possibly, in that it may tell us why the hot recompilations are happening. However, if the recompilation is triggered by the native code itself, we'll have to dig deeper.

jdmcclur commented 3 years ago

Here is the data from the booking service. oddly, throughput went up a couple % when I added -Xjit:verbose={compilePerformance} booking.vlog.log

mpirvu commented 3 years ago

Unfortunately, I cannot determine why that method gets compiled again at hot (after the first failure). I ran AcmeAir mono to see if that method even shows in the vlog, but it doesn't (The intent was to artificially recompile it at hot and then fail on purpose).

jdmcclur commented 3 years ago

@mpirvu - yeah that method wont be invoked with acmeair mono as there is no authentication done. You could try to enable the mpJwt-1.2 feature and see if is invoked (I'll try it out).

jdmcclur commented 3 years ago

@mpirvu - looks like adding <feature>mpJwt-1.2</feature> at least loads the class. Can you try that?

mpirvu commented 3 years ago

My build of Liberty_21.0.0.3/AcmeAir does no run with <feature>mpJwt-1.2</feature>. It tried it with <feature>mpJwt-1.1</feature> and that works, but I didn't see that LibertyAuthFilter.filter method being recompiled at hot. I am going to push it artificially to hot.

mpirvu commented 3 years ago

I managed to recompile that method at hot, but it didn't reach the scratch memory limit, so I reduced the limit with an option for that method only. Now I have a hot recompilation that fails and is followed by a warm recompilation. This is exactly the behavior I was expecting. All in all, I cannot reproduce the behavior from this issue, and I may have to provide some instrumented builds.

mpirvu commented 3 years ago

Looking at reasons of why a method could trigger a recompilation from itself I see a few possibilities

  1. Guarded counting recompilation. Disable with -Xjit:disableGuardedCountingRecompilation
  2. OSR and inlined method redefinition. Disable with -Xjit:disableRecompDueToInlinedMethodRedefinition
  3. JProfiling. Disable with -Xjit:disableJProfiling

None of these is likely to happen for various reasons, but let's disable them all with -Xjit:disableGuardedCountingRecompilation,disableRecompDueToInlinedMethodRedefinition,disableJProfiling and see if the regression still persists.

jdmcclur commented 3 years ago

With these options, the regression is still there, actually a bit worse. -Xjit:disableGuardedCountingRecompilation,disableRecompDueToInlinedMethodRedefinition,disableJProfiling

jdmcclur commented 3 years ago

@mpirvu - Liberty is moving to using semeru images as the base image soon, which are at 0.27.0 now. So, I did some runs with this as the base, and it looks like the regression is gone. Throughput levels are back to normal and I only see this once or twice in the jitlogs.

booking

+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007F65AF589BE0-00007F65AF58A81F OrdinaryMethod - Q_SZ=2480 Q_SZI=2474 QW=12778 j9m=0000000000CAB6F0 bcsz=67 OSR compThreadID=1 CpuLoad=750%(93%avg) JvmCpu=370%

customer

+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007F9BAF8B4C80-00007F9BAF8B4FF6 OrdinaryMethod - Q_SZ=702 Q_SZI=702 QW=3752 j9m=0000000000CB44F0 bcsz=67 OSR compThreadID=0 CpuLoad=776%(97%avg) JvmCpu=251%
+ (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007F9BAFDF7020-00007F9BAFE03504 OrdinaryMethod - Q_SZ=1 Q_SZI=0 QW=60 j9m=0000000000CB44F0 bcsz=67 OSR compThreadID=0 CpuLoad=757%(94%avg) JvmCpu=142%

I have done a handful of runs and they all look good, so will close this.

mpirvu commented 3 years ago

I am glad that the problem disappeared. Somehow we take less memory for the hot compilation which now succeeds. However, I wonder what would happen if the application needed to run in smaller containers, so small that the hot compilation would fail again. Could you please run with 0.27 with -Xjit:scratchSpaceLimit=125000 to see whether that continuous warm-->hot-->warm-->hot pattern is still present? Thanks

jdmcclur commented 3 years ago

Yes, with that option, I see the issue with the customer services (but not the booking).

+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007EFED8E041A0-00007EFED8E04516 OrdinaryMethod - Q_SZ=628 Q_SZI=628 QW=3392 j9m=0000000000CAC6F0 bcsz=67 OSR compThreadID=5 CpuLoad=775%(96%avg) JvmCpu=245%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=2501863us compilationHeapLimitExceeded memLimit=125000 KB freePhysicalMemory=249345 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007EFED8F74260-00007EFED8F745D6 OrdinaryMethod - Q_SZ=11 Q_SZI=11 QW=92 j9m=0000000000CAC6F0 bcsz=67 OSR compThreadID=3 CpuLoad=767%(95%avg) JvmCpu=140%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=3519207us compilationHeapLimitExceeded memLimit=125000 KB freePhysicalMemory=249334 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007EFED8443580-00007EFED84438D6 OrdinaryMethod - Q_SZ=1 Q_SZI=0 QW=66 j9m=0000000000CAC6F0 bcsz=67 OSR compThreadID=2 CpuLoad=755%(94%avg) JvmCpu=126%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=3145852us compilationHeapLimitExceeded memLimit=125000 KB freePhysicalMemory=249564 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007EFED859C7E0-00007EFED859CB36 OrdinaryMethod - Q_SZ=2 Q_SZI=0 QW=42 j9m=0000000000CAC6F0 bcsz=67 OSR compThreadID=2 CpuLoad=760%(95%avg) JvmCpu=128%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=2658834us compilationHeapLimitExceeded memLimit=125000 KB freePhysicalMemory=249365 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007EFED8461800-00007EFED8461B56 OrdinaryMethod - Q_SZ=8 Q_SZI=7 QW=61 j9m=0000000000CAC6F0 bcsz=67 OSR compThreadID=2 CpuLoad=780%(97%avg) JvmCpu=111%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=2672951us compilationHeapLimitExceeded memLimit=125000 KB freePhysicalMemory=249549 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007EFED85C2540-00007EFED85C2896 OrdinaryMethod - Q_SZ=1 Q_SZI=0 QW=60 j9m=0000000000CAC6F0 bcsz=67 OSR compThreadID=2 CpuLoad=748%(93%avg) JvmCpu=160%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=2099805us compilationHeapLimitExceeded memLimit=125000 KB freePhysicalMemory=249646 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007EFED8E55340-00007EFED8E55696 OrdinaryMethod - Q_SZ=1 Q_SZI=1 QW=42 j9m=0000000000CAC6F0 bcsz=67 OSR compThreadID=2 CpuLoad=717%(89%avg) JvmCpu=180%
! (hot) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V time=2611141us compilationHeapLimitExceeded memLimit=125000 KB freePhysicalMemory=249530 MB
+ (warm) com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V @ 00007EFED8E55F80-00007EFED8E562D6 OrdinaryMethod - Q_SZ=3 Q_SZI=1 QW=48 j9m=0000000000CAC6F0 bcsz=67 OSR compThreadID=2 CpuLoad=725%(90%avg) JvmCpu=161%
...
mpirvu commented 3 years ago

I took the time and installed AcmeAir MS to reproduce the problem. From the logs I captured I think that the recompilation warm-->hot is triggered by the EDO (exception directed optimizations): the JIT profiles the execution to determine whether exceptions are frequently thrown, and if they do, it recompiles the body attempting to transform the throw-catch into a goto.

@jdmcclur could you please try with -Xjit:scratchSpaceLimit=125000,{com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter*}(disableEDO) If the problem disappears I will deliver a fix soon. Thanks

jdmcclur commented 3 years ago

@mpirvu - it looks like the problem does go away with -Xjit:scratchSpaceLimit=125000,{com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter*}(disableEDO)

mpirvu commented 3 years ago

Thanks! I'll work on a solution

vijaysun-omr commented 3 years ago

I don't know if the fact that exception directed optimization (EDO) in used in the method in question could be a sign that something is not quite working right, since it would suggest a fair number of exceptions are happening in code that was either present in or inlined into that compiled method.

I mention this not to affect the plan for delivery for the fix for the issue on the OpenJ9 side, but more as a question for @jdmcclur to consider assuming the exception is due to something that happened in the Liberty auth logic based on the above signature. Of course it could also be related to some JCL code that was inlined into that Liberty code, which would bring that question back into the JDK realm. @mpirvu if you agree, I think we should look at the JIT trace log for the method in question and see if we can understand what the EDO related exception is, and then reason about whether that can be justified or not.

jdmcclur commented 3 years ago

Hmm, actually, looks like there were a small amount of Liberty errors in this code path when the problem originally happened, but this has been fixed in 21.0.0.11 (https://github.com/OpenLiberty/open-liberty/pull/18465), and I still see this issue there with scratchSpaceLimit=125000, although it seems to be better without that option. (Also, I do not see the problem with openj9 0.24.0 even with the small amount of Liberty errors).

vijaysun-omr commented 3 years ago

I think we should still check if EDO triggers a hot compilation with 0.26 and with 21.0.0.11 and if it does, try to understand what the exceptions are that are being thrown and reassure ourselves that all is working as designed.

mpirvu commented 3 years ago

I'll generate compilation logs with 0.24

mpirvu commented 3 years ago

I reproduced the same issue even with 0.24 (though for some reason it happens much more infrequently). From the compilation logs I see the following 3 catch blocks:

 [0x7fadc71a5020]       Fence Relative [ 00007FADC7050930 ]     # FENCE BBStart <block_7> (frequency 6) (catches com/ibm/ws/security/authorization/util/UnauthenticatedException)
 [0x7fadc71a5280]       dec     dword ptr [$0x00007faedbce7ab0]         # DEC4Mem, SymRef [#407 -607225168]
 [0x7fadc71a5310]       je      Snippet Label L0081             # JE4   # (Force Recompilation Snippet)
 [0x7fadc71aaf30]       Fence Relative [ 00007FADC7050DF0 ]     # FENCE BBStart <block_14> (frequency 2) (catches org/apache/cxf/interceptor/security/AccessDeniedException) (cold)
 [0x7fadc71ab190]       dec     dword ptr [$0x00007faedbce7ab0]         # DEC4Mem, SymRef [#413 -607225168]
 [0x7fadc71ab220]       je      Snippet Label L0145             # JE4   # (Force Recompilation Snippet)
 [0x7fadc71b3250]       Fence Relative [ 00007FADC7050CC0 ]     # FENCE BBStart <block_13> (frequency 2) (catches com/ibm/ws/security/authorization/util/UnauthenticatedException) (cold)
 [0x7fadc71b34b0]       dec     dword ptr [$0x00007faedbce7ab0]         # DEC4Mem, SymRef [#421 -607225168]
 [0x7fadc71b3540]       je      Snippet Label L0193             # JE4   # (Force Recompilation Snippet)
 [0x7fadc71b35e0]       Label L0194:                    # LABEL

I cannot tell which of 3 catch block is most responsible for triggering the recompilation (they all decrement the same counter).

Those profiling counters are not always added to the native body. The logic reads:

TR_J9VMBase::shouldPerformEDO(
      TR::Block *catchBlock,
      TR::Compilation * comp)
   {
   TR_ASSERT(catchBlock->isCatchBlock(), "shouldPerformEDO expected block_%d to be a catch block", catchBlock->getNumber());

   if (comp->getOption(TR_DisableEDO))
      return false;

   if (catchBlock->isOSRCatchBlock()) // Can't currently induce recompilation from an OSR block
      return false;

   static char *disableEDORecomp = feGetEnv("TR_disableEDORecomp");
   if (disableEDORecomp)
      return false;

   TR::Recompilation *recomp = comp->getRecompilationInfo();

   if (recomp
      && comp->getOptions()->allowRecompilation()
      && recomp->useSampling()
      && recomp->shouldBeCompiledAgain()
      && comp->getMethodHotness() < hot
      && comp->getNodeCount() < TR::Options::_catchSamplingSizeThreshold)
      {
      return true;
      }
   else
      return false;
   }

TR::Options::_catchSamplingSizeThreshold is 1100 and the number of nodes is 435, so this cannot be it why most of the time we bail out. I cannot tell about this if (catchBlock->isOSRCatchBlock()) test.

mpirvu commented 3 years ago

An idea based on this:

   ncount_t getNodeCount();
   ncount_t generateAccurateNodeCount();
   ncount_t getAccurateNodeCount();

I think that every time we generate a new node we bump the node count and this reflected by getNodeCount();. However, nodes can also be deleted, so the true number of nodes is reflected by getAccurateNodeCount(). In the compilation log I think we print the accurate number of nodes which is 435, but the total number of nodes ever created can be over 1100. The logic based on node count introduces some non-determinism. Depending on how many nodes the optimizer creates we may or may not generate profiling instructions.

vijaysun-omr commented 3 years ago

I don't believe the if (catchBlock->isOSRCatchBlock()) condition is doing anything unexpected. An OSR catch block is not really a catch block that can catch Java exceptions and so we should not be adding in profiling instrumentation related to EDO to such a block.

mpirvu commented 3 years ago

Problem happens even with openliberty-daily which I assume contains the fix for https://github.com/OpenLiberty/open-liberty/pull/18465 which was merged on Sep 08. Actually, that fix is in gm-21.0.0.10 according to git

vijaysun-omr commented 3 years ago

Maybe running with verbose stack walk will show which of the exceptions gets thrown. This could be one way to try and correlate back to the application logic.

mpirvu commented 3 years ago

I used

JVM_ARGS=-Xdump:stack:events=catch,filter=com/ibm/ws/security/authorization/util/UnauthenticatedException,label=/tmp/trc.%pid.%seq.txt

to generate a stack trace every time an exception of type com/ibm/ws/security/authorization/util/UnauthenticatedException is caught. There are many trace files being generated because there are many exceptions being caught. I picked 3 at random and they look very similar:

Thread=Default Executor-thread-68 (00007FD9BC01CD98) Status=Running
        at com/ibm/ws/jaxrs20/security/LibertyAuthFilter.filter(Ljavax/ws/rs/container/ContainerRequestContext;)V (LibertyAuthFilter.java:53) (Compiled Code)
        at org/apache/cxf/jaxrs/utils/JAXRSUtils.runContainerRequestFilters(Lorg/apache/cxf/jaxrs/provider/ServerProviderFactory;Lorg/apache/cxf/message/Message;ZLjava/util/Set;)Z (JAXRSUtils.java:1929) (Compiled Code)
        at org/apache/cxf/jaxrs/interceptor/JAXRSInInterceptor.processRequest(Lorg/apache/cxf/message/Message;Lorg/apache/cxf/message/Exchange;)V (JAXRSInInterceptor.java:281) (Compiled Code)
        at org/apache/cxf/jaxrs/interceptor/JAXRSInInterceptor.handleMessage(Lorg/apache/cxf/message/Message;)V (JAXRSInInterceptor.java:96) (Compiled Code)
        at org/apache/cxf/phase/PhaseInterceptorChain.doIntercept(Lorg/apache/cxf/message/Message;)Z (PhaseInterceptorChain.java:308) (Compiled Code)
        at org/apache/cxf/transport/ChainInitiationObserver.onMessage(Lorg/apache/cxf/message/Message;)V (ChainInitiationObserver.java:123) (Compiled Code)
        at org/apache/cxf/transport/http/AbstractHTTPDestination.invoke(Ljavax/servlet/ServletConfig;Ljavax/servlet/ServletContext;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (AbstractHTTPDestination.java:277) (Compiled Code)
        at com/ibm/ws/jaxrs20/endpoint/AbstractJaxRsWebEndpoint.invoke(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (AbstractJaxRsWebEndpoint.java:137) (Compiled Code)
        at com/ibm/websphere/jaxrs/server/IBMRestServlet.handleRequest(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (IBMRestServlet.java:146) (Compiled Code)
        at com/ibm/websphere/jaxrs/server/IBMRestServlet.doGet(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (IBMRestServlet.java:112) (Compiled Code)
        at javax/servlet/http/HttpServlet.service(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V (HttpServlet.java:686) (Compiled Code)
        at com/ibm/websphere/jaxrs/server/IBMRestServlet.service(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (IBMRestServlet.java:96) (Compiled Code)
        at com/ibm/ws/webcontainer/servlet/ServletWrapper.service(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Lcom/ibm/ws/webcontainer/webapp/WebAppServletInvocationEvent;)V (ServletWrapper.java:1258) (Compiled Code)
        at com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Lcom/ibm/ws/webcontainer/webapp/WebAppDispatcherContext;)V (ServletWrapper.java:746) (Compiled Code)
        at com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (ServletWrapper.java:443) (Compiled Code)
        at com/ibm/ws/webcontainer/filter/WebAppFilterChain.invokeTarget(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (WebAppFilterChain.java:193) (Compiled Code)
        at com/ibm/ws/webcontainer/filter/WebAppFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (WebAppFilterChain.java:98) (Compiled Code)
        at com/ibm/ws/security/jaspi/JaspiServletFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V (JaspiServletFilter.java:56) (Compiled Code)
        at com/ibm/ws/webcontainer/filter/FilterInstanceWrapper.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V (FilterInstanceWrapper.java:201) (Compiled Code)
        at com/ibm/ws/webcontainer/filter/WebAppFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (WebAppFilterChain.java:91) (Compiled Code)
        at com/ibm/ws/webcontainer/filter/WebAppFilterManager.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Lcom/ibm/wsspi/webcontainer/RequestProcessor;Lcom/ibm/ws/webcontainer/webapp/WebAppDispatcherContext;)V (WebAppFilterManager.java:1002) (Compiled Code)
        at com/ibm/ws/webcontainer/filter/WebAppFilterManager.invokeFilters(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Lcom/ibm/wsspi/webcontainer/servlet/IServletContext;Lcom/ibm/wsspi/webcontainer/RequestProcessor;Ljava/util/EnumSet;Lcom/ibm/wsspi/http/HttpInboundConnection;)Z (WebAppFilterManager.java:1140) (Compiled Code)
        at com/ibm/ws/webcontainer/filter/WebAppFilterManager.invokeFilters(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Lcom/ibm/wsspi/webcontainer/servlet/IServletContext;Lcom/ibm/wsspi/webcontainer/RequestProcessor;Ljava/util/EnumSet;)Z (WebAppFilterManager.java:1011) (Compiled Code)
        at com/ibm/ws/webcontainer/servlet/CacheServletWrapper.handleRequest(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (CacheServletWrapper.java:75) (Compiled Code)
        at com/ibm/ws/webcontainer40/servlet/CacheServletWrapper40.handleRequest(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V (CacheServletWrapper40.java:85) (Compiled Code)
        at com/ibm/ws/webcontainer/WebContainer.handleRequest(Lcom/ibm/websphere/servlet/request/IRequest;Lcom/ibm/websphere/servlet/response/IResponse;Lcom/ibm/ws/webcontainer/VirtualHost;Lcom/ibm/wsspi/webcontainer/RequestProcessor;)V (WebContainer.java:938) (Compiled Code)
        at com/ibm/ws/webcontainer/osgi/DynamicVirtualHost$2.run()V (DynamicVirtualHost.java:279) (Compiled Code)
        at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink$TaskWrapper.run()V (HttpDispatcherLink.java:1184) (Compiled Code)
        at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink.wrapHandlerAndExecute(Ljava/lang/Runnable;)V (HttpDispatcherLink.java:453) (Compiled Code)
        at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink.ready(Lcom/ibm/wsspi/channelfw/VirtualConnection;)V (HttpDispatcherLink.java:412) (Compiled Code)
        at com/ibm/ws/http/channel/internal/inbound/HttpInboundLink.handleDiscrimination()V (HttpInboundLink.java:566) (Compiled Code)
        at com/ibm/ws/http/channel/internal/inbound/HttpInboundLink.handleNewRequest()V (HttpInboundLink.java:500) (Compiled Code)
        at com/ibm/ws/http/channel/internal/inbound/HttpInboundLink.processRequest()V (HttpInboundLink.java:360) (Compiled Code)
        at com/ibm/ws/http/channel/internal/inbound/HttpICLReadCallback.complete(Lcom/ibm/wsspi/channelfw/VirtualConnection;Lcom/ibm/wsspi/tcpchannel/TCPReadRequestContext;)V (HttpICLReadCallback.java:70) (Compiled Code)
        at com/ibm/ws/tcpchannel/internal/WorkQueueManager.requestComplete(Lcom/ibm/ws/tcpchannel/internal/TCPBaseRequestContext;Ljava/io/IOException;)V (WorkQueueManager.java:504) (Compiled Code)
        at com/ibm/ws/tcpchannel/internal/WorkQueueManager.attemptIO(Lcom/ibm/ws/tcpchannel/internal/TCPBaseRequestContext;Z)Z (WorkQueueManager.java:574) (Compiled Code)
        at com/ibm/ws/tcpchannel/internal/WorkQueueManager.workerRun(Lcom/ibm/ws/tcpchannel/internal/TCPBaseRequestContext;Ljava/io/IOException;)V (WorkQueueManager.java:958) (Compiled Code)
        at com/ibm/ws/tcpchannel/internal/WorkQueueManager$Worker.run()V (WorkQueueManager.java:1047) (Compiled Code)
        at com/ibm/ws/threading/internal/ExecutorServiceImpl$RunnableWrapper.run()V (ExecutorServiceImpl.java:238) (Compiled Code)
        at java/util/concurrent/ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1149) (Compiled Code)
        at java/util/concurrent/ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:624) (Compiled Code)
        at java/lang/Thread.run()V (Thread.java:826) (Compiled Code)

Sometimes I see com/ibm/websphere/jaxrs/server/IBMRestServlet.doGet frame change into com/ibm/websphere/jaxrs/server/IBMRestServlet.doPut

mpirvu commented 3 years ago

I don't see any exceptions of type org/apache/cxf/interceptor/security/AccessDeniedException being caught

vijaysun-omr commented 3 years ago

@jdmcclur your thoughts on the above stack trace ?

jdmcclur commented 3 years ago

@mpirvu - Do you see errors (NPEs) coming from the auth service? (This is what I was seeing, which caused the JWT to fail to create. Then JMeter would try to access the booking/customer services without a JWT, which would fail/trigger an exception). I modified the Jmeter script to not do the login without a JWT (if the auth service failed). I could give you that version to see if there is any difference.

vijaysun-omr commented 2 years ago

Any update on this ?

jdmcclur commented 2 years ago

I have no updates - @mpirvu, let me know if you want something from me.

mpirvu commented 2 years ago

I didn't have time to work on this. I could try your last version of JMeter though.

mpirvu commented 2 years ago

I will be using the new JMeter script/build shortly. Before doing that I had one more run with the old JMeter and looked at the output from the auth service. There is this exception being thrown:

[11/5/21 18:40:43:834 UTC] 0000003b io.jaegertracing.internal.senders.SenderResolver             W Failed to get a sender from the sender factory.
java.lang.RuntimeException: TUDPTransport cannot connect:
        at io.jaegertracing.thrift.internal.reporters.protocols.ThriftUdpTransport.newThriftUdpClient(ThriftUdpTransport.java:50)
        at io.jaegertracing.thrift.internal.senders.UdpSender.<init>(UdpSender.java:57)
        at io.jaegertracing.thrift.internal.senders.ThriftSenderFactory.getSender(ThriftSenderFactory.java:36)
        at io.jaegertracing.internal.senders.SenderResolver.getSenderFromFactory(SenderResolver.java:110)
        at io.jaegertracing.internal.senders.SenderResolver.resolve(SenderResolver.java:88)
        at io.jaegertracing.Configuration$SenderConfiguration.getSender(Configuration.java:696)
        at io.jaegertracing.Configuration$ReporterConfiguration.getReporter(Configuration.java:593)
        at io.jaegertracing.Configuration$ReporterConfiguration.access$000(Configuration.java:553)
        at io.jaegertracing.Configuration.getTracerBuilder(Configuration.java:230)
        at io.jaegertracing.Configuration.getTracer(Configuration.java:253)
        at com.ibm.ws.microprofile.opentracing.jaeger.adapter.impl.ConfigurationImpl.getTracer(ConfigurationImpl.java:29)
        at com.ibm.ws.microprofile.opentracing.jaeger.JaegerTracerFactory.createJaegerTracer(JaegerTracerFactory.java:181)
        at <unknown class>.createJaegerTracer(OpentracingTracerManager.java:78)
        at <unknown class>.access$000(OpentracingTracerManager.java:35)
        at io.openliberty.opentracing.internal.OpentracingTracerManager$TracerCreator.apply(OpentracingTracerManager.java:87)
        at io.openliberty.opentracing.internal.OpentracingTracerManager$TracerCreator.apply(OpentracingTracerManager.java:81)
        at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
        at <unknown class>.ensureTracer(OpentracingTracerManager.java:58)
        at <unknown class>.getTracer(OpentracingTracerManager.java:164)
        at <unknown class>.filter(OpentracingContainerFilter.java:76)
        at org.apache.cxf.jaxrs.utils.JAXRSUtils.runContainerRequestFilters(JAXRSUtils.java:1929)
        at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:281)
        at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:96)
        at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
        at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:123)
        at <unknown class>.invoke(AbstractHTTPDestination.java:277)
        at <unknown class>.invoke(AbstractJaxRsWebEndpoint.java:137)
        at <unknown class>.handleRequest(IBMRestServlet.java:146)
        at <unknown class>.doPost(IBMRestServlet.java:104)
        at <unknown class>.service(HttpServlet.java:706)
        at <unknown class>.service(IBMRestServlet.java:96)
        at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1258)
        at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:746)
        at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:443)
        at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:183)
        at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:94)
        at <unknown class>.doFilter(JaspiServletFilter.java:56)
        at <unknown class>.doFilter(FilterInstanceWrapper.java:201)
        at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
        at <unknown class>.doFilter(WebAppFilterManager.java:1002)
        at <unknown class>.invokeFilters(WebAppFilterManager.java:1140)
        at <unknown class>.handleRequest(WebApp.java:5049)
        at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.handleRequest(DynamicVirtualHost.java:314)
        at <unknown class>.handleRequest(WebContainer.java:1007)
        at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:279)
        at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:1159)
        at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.wrapHandlerAndExecute(HttpDispatcherLink.java:428)
        at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.ready(HttpDispatcherLink.java:387)
        at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:566)
        at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleNewRequest(HttpInboundLink.java:500)
        at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.processRequest(HttpInboundLink.java:360)
        at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.ready(HttpInboundLink.java:327)
        at <unknown class>.sendToDiscriminators(NewConnectionInitialReadCallback.java:167)
        at <unknown class>.complete(NewConnectionInitialReadCallback.java:75)
        at <unknown class>.requestComplete(WorkQueueManager.java:504)
        at <unknown class>.attemptIO(WorkQueueManager.java:574)
        at <unknown class>.workerRun(WorkQueueManager.java:958)
        at com.ibm.ws.tcpchannel.internal.WorkQueueManager$Worker.run(WorkQueueManager.java:1047)
        at <unknown class>.run(ExecutorServiceImpl.java:238)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:826)
Caused by: java.net.SocketException: Unresolved address
        at java.net.DatagramSocket.connect(DatagramSocket.java:493)
        at io.jaegertracing.thrift.internal.reporters.protocols.ThriftUdpTransport.newThriftUdpClient(ThriftUdpTransport.java:48)
        ... 61 more
jdmcclur commented 2 years ago

@mpirvu - tldr: This warning is expected, all the services will have it. You can ignore it.

This is coming out of the open-tracing implementation (Jaeger). It's warning you that it can't connect to a Jaeger Service, so the open-tracing traces are not being sent anywhere (which is fine). It is an ugly warning though.

You can go the server.env in src/main/liberty/config and set this, if you want it to go away. This will affect performance.

JAEGER_SAMPLER_PARAM=0
mpirvu commented 2 years ago

I no longer see those exceptions with the new JMeter. Unfortunately I am now hitting another problem: if I run long enough, the flight service starts consuming all the CPU on the machine (even after JMeter terminates). All the executor threads seem to be stuck in this stack trace:

3XMTHREADINFO      "Default Executor-thread-350" J9VMThread:0x0000000000C3F300, omrthread_t:0x00007F8914034A28, java/lang/Thread:0x00000000F2EF1E18, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x1A4, isDaemon:true)
3XMTHREADINFO1            (native thread ID:0x1EC, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
3XMTHREADINFO2            (native stack address range from:0x00007F89795A1000, to:0x00007F89795E1000, size:0x40000)
3XMCPUTIME               CPU usage total: 58.735627824 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
1INTERNAL                    Unable to obtain lock context information
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/util/HashMap$TreeNode.putTreeVal(HashMap.java:2041(Compiled Code))
4XESTACKTRACE                at java/util/HashMap.putVal(HashMap.java:639(Compiled Code))
4XESTACKTRACE                at java/util/HashMap.put(HashMap.java:613(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/cdi/component/JaxRsFactoryImplicitBeanCDICustomizer.getClassFromCDI(JaxRsFactoryImplicitBeanCDICustomizer.java:245(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/cdi/component/JaxRsFactoryImplicitBeanCDICustomizer.getInstanceFromManagedObject(JaxRsFactoryImplicitBeanCDICustomizer.java:257(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/cdi/component/JaxRsFactoryImplicitBeanCDICustomizer.beforeServiceInvoke(JaxRsFactoryImplicitBeanCDICustomizer.java:219(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/server/LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:212(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/jaxrs/JAXRSInvoker.invoke(JAXRSInvoker.java:213(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/server/LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:444(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/jaxrs/JAXRSInvoker.invoke(JAXRSInvoker.java:112(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/interceptor/ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/interceptor/ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/phase/PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/transport/ChainInitiationObserver.onMessage(ChainInitiationObserver.java:123(Compiled Code))
4XESTACKTRACE                at org/apache/cxf/transport/http/AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:277(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/jaxrs20/endpoint/AbstractJaxRsWebEndpoint.invoke(AbstractJaxRsWebEndpoint.java:137(Compiled Code))
4XESTACKTRACE                at com/ibm/websphere/jaxrs/server/IBMRestServlet.handleRequest(IBMRestServlet.java:146(Compiled Code))
4XESTACKTRACE                at com/ibm/websphere/jaxrs/server/IBMRestServlet.doPost(IBMRestServlet.java:104(Compiled Code))
4XESTACKTRACE                at javax/servlet/http/HttpServlet.service(HttpServlet.java:706(Compiled Code))
4XESTACKTRACE                at com/ibm/websphere/jaxrs/server/IBMRestServlet.service(IBMRestServlet.java:96(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/servlet/ServletWrapper.service(ServletWrapper.java:1258(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest(ServletWrapper.java:746(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest(ServletWrapper.java:443(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/filter/WebAppFilterChain.invokeTarget(WebAppFilterChain.java:183(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/filter/WebAppFilterChain.doFilter(WebAppFilterChain.java:94(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/security/jaspi/JaspiServletFilter.doFilter(JaspiServletFilter.java:56(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/webcontainer/filter/FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:201(Compiled Code))
...

This is happening for both OpenJ9 releases 0.29.0 and 0.24.0, so it's not a new problem introduced by 0.29.0

mpirvu commented 2 years ago

Looking at methods that are compiled very late (since the problem happens after 20-25 minutes of load) I found this HashMap method:

 (scorching) Compiling java/util/HashMap.putVal(ILjava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object;  OrdinaryMethod j9m=0000000000055AF8 t=1294715 compThreadID=1 memLimit=262144 KB freePhysicalMemory=302 MB
+ (scorching) java/util/HashMap.putVal(ILjava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object; @ 00007F67A81C27C8-00007F67A81D4B74 OrdinaryMethod 70.50% T Q_SZ=0 Q_SZI=0 QW=100 j9m=0000000000055AF8 bcsz=300 OSR time=2407284us mem=[region=131520 system=147456]KB compThreadID=1 CpuLoad=799%(99%avg) JvmCpu=797%