Closed pshipton closed 3 years ago
@dmitripivkine fyi. Note this failed in the 0.24 release stream.
This hang occur because GC is waiting for JIT/GC class unload mutex lock
to be acquired on the way to System GC in thread "load-6"
:
3XMTHREADINFO "load-6" J9VMThread:0x0000000030457F00, omrthread_t:0x0000010023BCE0A0, java/lang/Thread:0x00000000800BB530, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x2C, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x519010F, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00001020)
3XMCPUTIME CPU usage total: 0.619328000 secs, user: 0.598040000 secs, system: 0.021288000 secs, current category="Application"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/lang/Runtime.gc(Native Method)
4XESTACKTRACE at java/lang/System.gc(System.java:354)
4XESTACKTRACE at net/openj9/test/arithmetics/TestArithmeticOperations.testGreaterThanPackedDecimal(TestArithmeticOperations.java:1838(Compiled Code))
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK _event_wait+0x344 (0x09000000005843C8 [libpthreads.a+0x173c8])
4XENATIVESTACK _cond_wait_local+0x4dc (0x0900000000592920 [libpthreads.a+0x25920])
4XENATIVESTACK _cond_wait+0xc8 (0x0900000000592F0C [libpthreads.a+0x25f0c])
4XENATIVESTACK pthread_cond_wait+0x19c (0x0900000000593B60 [libpthreads.a+0x26b60])
4XENATIVESTACK IPRA.$monitor_wait_original+0x9b0 (0x0900000010746D34 [libj9thr29.so+0x9d34])
4XENATIVESTACK omrthread_monitor_wait+0x40 (0x0900000010746104 [libj9thr29.so+0x9104])
4XENATIVESTACK omrthread_rwmutex_enter_write+0x50 (0x0900000010752B34 [libj9thr29.so+0x15b34])
4XENATIVESTACK enterClassUnloadMutex__21MM_ClassLoaderManagerFP18MM_EnvironmentBase+0xd0 (0x0900000011D4F874 [libj9gc29.so+0x12874])
4XENATIVESTACK mainThreadGarbageCollectStarted__26MM_GlobalCollectorDelegateFP18MM_EnvironmentBase+0xe8 (0x0900000011DD34EC [libj9gc29.so+0x964ec])
2LKREGMON JIT/GC class unload mutex lock (0x0000010010052A88): <unowned>
3LKNOTIFYQ Waiting to be notified:
3LKWAITNOTIFY "load-6" (J9VMThread:0x0000000030457F00)
This JIT/GC class unload mutex lock
supposes to be released by JIT Compilation threads by GC request but seems it was not.
@0xdaryl @gita-omr FYI
Started a 60x (10x by 6 machines) grinder to see if it reproduces https://ci.eclipse.org/openj9/job/Grinder/1320/
The grinder passed.
FYI: One of JIT Compilation threads is not waiting but doing something for more than 100 minutes:
3XMTHREADINFO "JIT Compilation Thread-006" J9VMThread:0x000000003001E600, omrthread_t:0x0000010020BEF350, java/lang/Thread:0x0000000080052A90, state:R, prio=10
3XMJAVALTHREAD (java/lang/Thread getId:0x8, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x51E0291, native priority:0xB, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000081)
3XMCPUTIME CPU usage total: 6437.343774000 secs, user: 6437.232967000 secs, system: 0.110807000 secs, current category="JIT"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 No Java callstack associated with this thread
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK (0x090000001117B110 [libj9jit29.so+0x88a110])
4XENATIVESTACK (0x0900000010CAD228 [libj9jit29.so+0x3bc228])
4XENATIVESTACK (0x0900000010CAAF24 [libj9jit29.so+0x3b9f24])
4XENATIVESTACK (0x0900000010B0D728 [libj9jit29.so+0x21c728])
4XENATIVESTACK omrsig_protect+0x488 (0x09000000107BABCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x0900000010AF15B8 [libj9jit29.so+0x2005b8])
4XENATIVESTACK (0x090000001121E610 [libj9jit29.so+0x92d610])
4XENATIVESTACK (0x0900000011220B4C [libj9jit29.so+0x92fb4c])
4XENATIVESTACK (0x090000001121FE18 [libj9jit29.so+0x92ee18])
4XENATIVESTACK omrsig_protect+0x488 (0x09000000107BABCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x090000001121F5F4 [libj9jit29.so+0x92e5f4])
4XENATIVESTACK thread_wrapper+0x524 (0x0900000010741808 [libj9thr29.so+0x4808])
4XENATIVESTACK _pthread_body+0xf0 (0x0900000000570E14 [libpthreads.a+0x3e14])
I have test results downloaded (full set including system core, java core etc) in case somebody wants to look
Another one https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_4/22 DaaLoadTest_daa1_special_22
Diagnostics (just javacore's) https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Test/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_4/22/system_test_output.tar.gz
3XMTHREADINFO "load-13" J9VMThread:0x0000000030463800, omrthread_t:0x000001002415F688, java/lang/Thread:0x00000000802238C0, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x33, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x53C02BD, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
3XMCPUTIME CPU usage total: 0.144760000 secs, user: 0.139528000 secs, system: 0.005232000 secs, current category="Application"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/math/MutableBigInteger.divideKnuth(MutableBigInteger.java:1215)
4XESTACKTRACE at java/math/MutableBigInteger.divide(MutableBigInteger.java:1153)
4XESTACKTRACE at java/math/MutableBigInteger.divide(MutableBigInteger.java:1147)
4XESTACKTRACE at java/math/BigInteger.smallToString(BigInteger.java:3832)
4XESTACKTRACE at java/math/BigInteger.toString(BigInteger.java:3877)
4XESTACKTRACE at java/math/BigInteger.toString(BigInteger.java:3905)
4XESTACKTRACE at java/math/BigInteger.toString(BigInteger.java:3804)
4XESTACKTRACE at java/math/BigInteger.toString(BigInteger.java:3959)
4XESTACKTRACE at net/openj9/test/Utils.toPrecision(Utils.java:489)
4XESTACKTRACE at net/openj9/test/arithmetics/TestPerformance.getReferenceValues(TestPerformance.java:433)
4XESTACKTRACE at net/openj9/test/arithmetics/TestPerformance.testAdd(TestPerformance.java:105)
4XESTACKTRACE at net/openj9/test/arithmetics/TestPerformance.testIntrinsicPerformance(TestPerformance.java:76)
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_2/24 MathLoadTest_bigdecimal_special_17 variation: Mode555 JVM_OPTIONS: -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet
Diagnostics (just javacore's): https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Test/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_2/24/system_test_output.tar.gz
3XMTHREADINFO "load-13" J9VMThread:0x0000000030463700, omrthread_t:0x000001002495E548, java/lang/Thread:0x0000000076C01B38, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x33, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x197032B, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000001)
3XMCPUTIME CPU usage total: 2.957708000 secs, user: 2.944704000 secs, system: 0.013004000 secs, current category="Application"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at sun/reflect/NativeMethodAccessorImpl.invoke0(Native Method)
4XESTACKTRACE at sun/reflect/NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62(Compiled Code))
4XESTACKTRACE at sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43(Compiled Code))
4XESTACKTRACE at java/lang/reflect/Method.invoke(Method.java:498(Compiled Code))
4XESTACKTRACE at junit/framework/TestCase.runTest(TestCase.java:176(Compiled Code))
Another one https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_4/22 DaaLoadTest_daa1_special_22
There is the same problem - JIT Compilation thread is running for ~10740 seconds and keeps GC class unload mutex lock
and prevents STW GC to start:
3XMTHREADINFO "JIT Compilation Thread-001" J9VMThread:0x0000000030013000, omrthread_t:0x00000100208C1940, java/lang/Thread:0x00000000800384D0, state:R, prio=10
3XMJAVALTHREAD (java/lang/Thread getId:0x3, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x23D02CF, native priority:0xB, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000081)
3XMCPUTIME CPU usage total: 10739.630518000 secs, user: 10739.594203000 secs, system: 0.036315000 secs, current category="JIT"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 No Java callstack associated with this thread
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK (0x090000000CFD0A10 [libj9jit29.so+0x889a10])
4XENATIVESTACK (0x090000000CB02228 [libj9jit29.so+0x3bb228])
4XENATIVESTACK (0x090000000CAFFF24 [libj9jit29.so+0x3b8f24])
4XENATIVESTACK (0x090000000C963868 [libj9jit29.so+0x21c868])
4XENATIVESTACK omrsig_protect+0x488 (0x090000000ABF2BCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x090000000C9476B8 [libj9jit29.so+0x2006b8])
4XENATIVESTACK (0x090000000D074110 [libj9jit29.so+0x92d110])
4XENATIVESTACK (0x090000000D07664C [libj9jit29.so+0x92f64c])
4XENATIVESTACK (0x090000000D075918 [libj9jit29.so+0x92e918])
4XENATIVESTACK omrsig_protect+0x488 (0x090000000ABF2BCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x090000000D0750F4 [libj9jit29.so+0x92e0f4])
4XENATIVESTACK thread_wrapper+0x524 (0x090000000A71B808 [libj9thr29.so+0x4808])
4XENATIVESTACK _pthread_body+0xf0 (0x0900000000570E14 [libpthreads.a+0x3e14])
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_2/24 MathLoadTest_bigdecimal_special_17 variation: Mode555 JVM_OPTIONS: -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet
Diagnostics (just javacore's): https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Test/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_2/24/system_test_output.tar.gz
This is the same problem as well, just different stack on GC side due Balanced:
3XMTHREADINFO "Dedicated GC Main" J9VMThread:0x00000000300ED800, omrthread_t:0x00000100233FA3C8, java/lang/Thread:0x00000000764185C8, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0xC, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x1F7011F, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x000000a1)
3XMCPUTIME CPU usage total: 0.321794000 secs, user: 0.312383000 secs, system: 0.009411000 secs, current category="GC"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 No Java callstack associated with this thread
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK _event_wait+0x344 (0x09000000005843C8 [libpthreads.a+0x173c8])
4XENATIVESTACK _cond_wait_local+0x4dc (0x0900000000592920 [libpthreads.a+0x25920])
4XENATIVESTACK _cond_wait+0xc8 (0x0900000000592F0C [libpthreads.a+0x25f0c])
4XENATIVESTACK pthread_cond_wait+0x19c (0x0900000000593B60 [libpthreads.a+0x26b60])
4XENATIVESTACK IPRA.$monitor_wait_original+0x9b0 (0x09000000128ABD34 [libj9thr29.so+0x9d34])
4XENATIVESTACK omrthread_monitor_wait+0x40 (0x09000000128AB104 [libj9thr29.so+0x9104])
4XENATIVESTACK omrthread_rwmutex_enter_write+0x50 (0x09000000128B7B34 [libj9thr29.so+0x15b34])
4XENATIVESTACK enterClassUnloadMutex__21MM_ClassLoaderManagerFP18MM_EnvironmentBase+0xd0 (0x0900000013EB7874 [libj9gc29.so+0x12874])
4XENATIVESTACK unloadDeadClassLoaders__28MM_IncrementalGenerationalGCFP19MM_EnvironmentVLHGC+0x4cc (0x0900000014124DF0 [libj9gc29.so+0x27fdf0])
4XENATIVESTACK globalMarkPhase__28MM_IncrementalGenerationalGCFP19MM_EnvironmentVLHGCb+0xd54 (0x0900000014123718 [libj9gc29.so+0x27e718])
4XENATIVESTACK runGlobalMarkPhaseIncrement__28MM_IncrementalGenerationalGCFP19MM_EnvironmentVLHGC+0x950 (0x0900000014121694 [libj9gc29.so+0x27c694])
4XENATIVESTACK mainThreadGarbageCollect__28MM_IncrementalGenerationalGCFP18MM_EnvironmentBaseP22MM_AllocateDescriptionbT3+0x284 (0x0900000014120768 [libj9gc29.so+0x27b768])
4XENATIVESTACK mainThreadEntryPoint__15MM_MainGCThreadFv+0x30c (0x0900000014065730 [libj9gc29.so+0x1c0730])
4XENATIVESTACK main_thread_proc2__15MM_MainGCThreadFP14OMRPortLibraryPv+0x18 (0x09000000140652DC [libj9gc29.so+0x1c02dc])
4XENATIVESTACK omrsig_protect+0x488 (0x090000001291FBCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK main_thread_proc__15MM_MainGCThreadFPv+0x54 (0x0900000014065A18 [libj9gc29.so+0x1c0a18])
4XENATIVESTACK thread_wrapper+0x524 (0x09000000128A6808 [libj9thr29.so+0x4808])
4XENATIVESTACK _pthread_body+0xf0 (0x0900000000570E14 [libpthreads.a+0x3e14])
This JIT Compilation thread is a problem:
3XMTHREADINFO "JIT Compilation Thread-000" J9VMThread:0x0000000030010B00, omrthread_t:0x00000100100536D0, java/lang/Thread:0x00000000764180F0, state:R, prio=10
3XMJAVALTHREAD (java/lang/Thread getId:0x2, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x5F60211, native priority:0xB, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000081)
3XMCPUTIME CPU usage total: 2119.650525000 secs, user: 2119.643982000 secs, system: 0.006543000 secs, current category="JIT"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 No Java callstack associated with this thread
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK (0x09000000132DFA10 [libj9jit29.so+0x889a10])
4XENATIVESTACK (0x0900000012E11228 [libj9jit29.so+0x3bb228])
4XENATIVESTACK (0x0900000012E0EF24 [libj9jit29.so+0x3b8f24])
4XENATIVESTACK (0x0900000012C72868 [libj9jit29.so+0x21c868])
4XENATIVESTACK omrsig_protect+0x488 (0x090000001291FBCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x0900000012C566B8 [libj9jit29.so+0x2006b8])
4XENATIVESTACK (0x0900000013383110 [libj9jit29.so+0x92d110])
4XENATIVESTACK (0x090000001338564C [libj9jit29.so+0x92f64c])
4XENATIVESTACK (0x0900000013384918 [libj9jit29.so+0x92e918])
4XENATIVESTACK omrsig_protect+0x488 (0x090000001291FBCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x09000000133840F4 [libj9jit29.so+0x92e0f4])
4XENATIVESTACK thread_wrapper+0x524 (0x09000000128A6808 [libj9thr29.so+0x4808])
4XENATIVESTACK _pthread_body+0xf0 (0x0900000000570E14 [libpthreads.a+0x3e14])
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_4/26 DaaLoadTest_daa1_special_17 variation: Mode555 JVM_OPTIONS: -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet
3XMTHREADINFO "JIT Compilation Thread-005" J9VMThread:0x000000003001C200, omrthread_t:0x0000010021F9C8E8, java/lang/Thread:0x00000000765AFFB8, state:R, prio=10
3XMJAVALTHREAD (java/lang/Thread getId:0x7, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x1F3038D, native priority:0xB, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000081)
3XMCPUTIME CPU usage total: 10775.432726000 secs, user: 10775.425218000 secs, system: 0.007508000 secs, current category="JIT"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 No Java callstack associated with this thread
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK (0x09000000133F0A10 [libj9jit29.so+0x889a10])
4XENATIVESTACK (0x0900000012F22228 [libj9jit29.so+0x3bb228])
4XENATIVESTACK (0x0900000012F1FF24 [libj9jit29.so+0x3b8f24])
4XENATIVESTACK (0x0900000012D83868 [libj9jit29.so+0x21c868])
4XENATIVESTACK omrsig_protect+0x488 (0x0900000012A30BCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x0900000012D676B8 [libj9jit29.so+0x2006b8])
4XENATIVESTACK (0x0900000013494110 [libj9jit29.so+0x92d110])
4XENATIVESTACK (0x090000001349664C [libj9jit29.so+0x92f64c])
4XENATIVESTACK (0x0900000013495918 [libj9jit29.so+0x92e918])
4XENATIVESTACK omrsig_protect+0x488 (0x0900000012A30BCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x09000000134950F4 [libj9jit29.so+0x92e0f4])
4XENATIVESTACK thread_wrapper+0x524 (0x0900000012548808 [libj9thr29.so+0x4808])
4XENATIVESTACK _pthread_body+0xf0 (0x0900000000570E14 [libpthreads.a+0x3e14])
3XMTHREADINFO "load-13" J9VMThread:0x0000000030467B00, omrthread_t:0x0000010024680F28, java/lang/Thread:0x0000000070F01820, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x33, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x5802E7, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
3XMCPUTIME CPU usage total: 0.777730000 secs, user: 0.774231000 secs, system: 0.003499000 secs, current category="Application"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/util/Formatter$FormatSpecifier.<init>(Formatter.java:2708(Compiled Code))
4XESTACKTRACE at java/util/Formatter.parse(Formatter.java:2560(Compiled Code))
4XESTACKTRACE at java/util/Formatter.format(Formatter.java:2501(Compiled Code))
4XESTACKTRACE at java/util/Formatter.format(Formatter.java:2455(Compiled Code))
4XESTACKTRACE at java/lang/String.format(String.java:4242(Compiled Code))
4XESTACKTRACE at net/openj9/test/PDMoveShifts/TestShiftsAndConvert.testShiftRight(TestShiftsAndConvert.java:3317(Compiled Code))
4XESTACKTRACE at net/openj9/test/PDMoveShifts/TestShiftsAndConvert.testShiftRightSimpleRound(TestShiftsAndConvert.java:3386(Compiled Code))
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_4/28 DaaLoadTest_daa1_special_17
I think we saw a few failures with classUnloadMutex being corrupted. One of them was happening with -Xint: https://github.com/eclipse/openj9/issues/11040 so I think it needs to be fixed first.
@gita-omr note that #11040 only occurred when https://github.com/eclipse/openj9/pull/10930 was merged, however this change was reverted.
My impression was that something similar was occurring even without #10930 but less often...
I'm not aware that we've seen the problem without #10930. Note this is a new problem, not seen before Dec 19th.
Note this has only been seen in these modes.
VM_OPTIONS: -Xcompressedrefs -Xgcpolicy:gencon -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xmn512k -Xcheck:gc:vmthreads:all:quiet
JVM_OPTIONS: -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet
@gita-omr I am pretty sure this is different problem. We have seen similar problems in the past. It was related with (something like) infinite loop Compilation thread going in due wrong data (sorry again, I never knew low level details on JIT side regarding this). I do have one system core preserved if somebody wants to look at
Sure, we’ll take a look. Thanks.
Perhaps related, MathLoadTest_all_special_17 hung, but there are no diagnostics. Opened https://github.com/AdoptOpenJDK/openjdk-systemtest/issues/389
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_4/34 23:40:11 variation: Mode555 23:40:11 JVM_OPTIONS: -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet
23:44:39 MLT 23:44:36.580 - Completed 10.0%. Number of tests started=523 (+16)
23:44:58 MLT 23:44:56.567 - Completed 12.5%. Number of tests started=658 (+135)
23:45:15 STF 23:45:14.023 - Heartbeat: Process MLT is still running
23:45:17 MLT 23:45:16.587 - Completed 15.0%. Number of tests started=787 (+129)
23:50:16 STF 23:50:14.149 - Heartbeat: Process MLT is still running
23:55:26 STF 23:55:14.208 - Heartbeat: Process MLT is still running
00:00:23 STF 00:00:14.296 - Heartbeat: Process MLT is still running
00:05:18 STF 00:05:14.400 - Heartbeat: Process MLT is still running
00:10:29 STF 00:10:14.453 - Heartbeat: Process MLT is still running
00:15:24 STF 00:15:14.498 - Heartbeat: Process MLT is still running
00:20:20 STF 00:20:14.034 - Heartbeat: Process MLT is still running
...
Runs like this until terminated by jenkins.
Similar but different. https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_2/35/ LambdaLoadTest_OpenJ9_NonLinux_special_17
3XMTHREADINFO "JIT Compilation Thread-006" J9VMThread:0x000000003001E600, omrthread_t:0x0000010021F9D210, java/lang/Thread:0x000000007026AF78, state:R, prio=10
3XMJAVALTHREAD (java/lang/Thread getId:0x8, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x68701AD, native priority:0xB, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000081)
3XMCPUTIME CPU usage total: 2184.193430000 secs, user: 2184.191159000 secs, system: 0.002271000 secs, current category="JIT"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 No Java callstack associated with this thread
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK (0x0900000016FCBA10 [libj9jit29.so+0x889a10])
4XENATIVESTACK (0x0900000016AFD228 [libj9jit29.so+0x3bb228])
4XENATIVESTACK (0x0900000016AFAF24 [libj9jit29.so+0x3b8f24])
4XENATIVESTACK (0x090000001695E868 [libj9jit29.so+0x21c868])
4XENATIVESTACK omrsig_protect+0x488 (0x090000001494EBCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x09000000169426B8 [libj9jit29.so+0x2006b8])
4XENATIVESTACK (0x090000001706F110 [libj9jit29.so+0x92d110])
4XENATIVESTACK (0x090000001707164C [libj9jit29.so+0x92f64c])
4XENATIVESTACK (0x0900000017070918 [libj9jit29.so+0x92e918])
4XENATIVESTACK omrsig_protect+0x488 (0x090000001494EBCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x09000000170700F4 [libj9jit29.so+0x92e0f4])
4XENATIVESTACK thread_wrapper+0x524 (0x09000000148D5968 [libj9thr29.so+0x4968])
4XENATIVESTACK _pthread_body+0xf0 (0x0900000000570E14 [libpthreads.a+0x3e14])
3XMTHREADINFO "load-13" J9VMThread:0x0000000030463B00, omrthread_t:0x00000100249510C8, java/lang/Thread:0x0000000071415990, state:CW, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x33, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x2E90195, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000181)
3XMCPUTIME CPU usage total: 0.007482000 secs, user: 0.006310000 secs, system: 0.001172000 secs, current category="Application"
3XMTHREADBLOCK Waiting on: java/lang/J9VMInternals$ClassInitializationLock@0x00000000716EFB88 Owned by: <unowned>
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at org/junit/runners/ParentRunner.validateClassRules(ParentRunner.java:160)
4XESTACKTRACE at org/junit/runners/ParentRunner.collectInitializationErrors(ParentRunner.java:127)
4XESTACKTRACE at org/junit/runners/ParentRunner.validate(ParentRunner.java:416)
4XESTACKTRACE at org/junit/runners/ParentRunner.<init>(ParentRunner.java:84)
4XESTACKTRACE at org/junit/runners/Suite.<init>(Suite.java:112)
4XESTACKTRACE at org/junit/runners/Suite.<init>(Suite.java:81)
4XESTACKTRACE at org/junit/runner/Computer.getSuite(Computer.java:28)
4XESTACKTRACE at org/junit/runner/Request.classes(Request.java:75)
4XESTACKTRACE at org/junit/runner/Request.classes(Request.java:91)
4XESTACKTRACE at net/adoptopenjdk/loadTest/adaptors/JUnitAdaptor.executeTest(JUnitAdaptor.java:129)
4XESTACKTRACE at net/adoptopenjdk/loadTest/LoadTestRunner$2.run(LoadTestRunner.java:182)
4XESTACKTRACE at java/util/concurrent/ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
4XESTACKTRACE at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
4XESTACKTRACE at java/lang/Thread.run(Thread.java:823(Compiled Code))
In a 0.24.0 release build. https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Release_testList_0/8 MauveMultiThrdLoad_special_19 variation: Mode614 JVM_OPTIONS: -Xcompressedrefs -Xgcpolicy:gencon -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xmn512k -Xcheck:gc:vmthreads:all:quiet
3XMTHREADINFO "JIT Compilation Thread-003" J9VMThread:0x0000000030017900, omrthread_t:0x00000100208C2DF0, java/lang/Thread:0x0000000080037B58, state:R, prio=10
3XMJAVALTHREAD (java/lang/Thread getId:0x5, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x69E00CD, native priority:0xB, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000000)
3XMCPUTIME CPU usage total: 4259.764781000 secs, user: 4259.757142000 secs, system: 0.007639000 secs, current category="JIT"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3 No Java callstack associated with this thread
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK (0x09000000132E6110 [libj9jit29.so+0x88a110])
4XENATIVESTACK (0x0900000012E18228 [libj9jit29.so+0x3bc228])
4XENATIVESTACK (0x0900000012E15F24 [libj9jit29.so+0x3b9f24])
4XENATIVESTACK (0x0900000012C78728 [libj9jit29.so+0x21c728])
4XENATIVESTACK omrsig_protect+0x488 (0x0900000012925BCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x0900000012C5C5B8 [libj9jit29.so+0x2005b8])
4XENATIVESTACK (0x0900000013389610 [libj9jit29.so+0x92d610])
4XENATIVESTACK (0x090000001338BB4C [libj9jit29.so+0x92fb4c])
4XENATIVESTACK (0x090000001338AE18 [libj9jit29.so+0x92ee18])
4XENATIVESTACK omrsig_protect+0x488 (0x0900000012925BCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK (0x090000001338A5F4 [libj9jit29.so+0x92e5f4])
4XENATIVESTACK thread_wrapper+0x524 (0x090000001243D808 [libj9thr29.so+0x4808])
4XENATIVESTACK _pthread_body+0xf0 (0x0900000000570E14 [libpthreads.a+0x3e14])
2LKREGMON JIT-CompilationQueueMonitor lock (0x00000100105EFD50): <unowned>
3LKNOTIFYQ Waiting to be notified:
3LKWAITNOTIFY "main" (J9VMThread:0x000000003000E700)
3XMTHREADINFO "main" J9VMThread:0x000000003000E700, omrthread_t:0x0000010010052D80, java/lang/Thread:0x0000000080013220, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x1, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0xA0351, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000000)
3XMCPUTIME CPU usage total: 1.167528000 secs, user: 1.040110000 secs, system: 0.127418000 secs, current category="Application"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=81600 (0x13EC0)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/lang/Shutdown.halt0(Native Method)
4XESTACKTRACE at java/lang/Shutdown.halt(Shutdown.java:142)
5XESTACKTRACE (entered lock: java/lang/Shutdown$Lock@0x0000000080037D48, entry count: 1)
4XESTACKTRACE at java/lang/Shutdown.exit(Shutdown.java:217)
5XESTACKTRACE (entered lock: java/lang/Shutdown@0x0000000080050A20, entry count: 1)
4XESTACKTRACE at java/lang/Runtime.exit(Runtime.java:109)
4XESTACKTRACE at java/lang/System.exit(System.java:345)
4XESTACKTRACE at net/adoptopenjdk/loadTest/LoadTest.main(LoadTest.java:130)
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK _event_wait+0x344 (0x09000000005843C8 [libpthreads.a+0x173c8])
4XENATIVESTACK _cond_wait_local+0x4dc (0x0900000000592920 [libpthreads.a+0x25920])
4XENATIVESTACK _cond_wait+0xc8 (0x0900000000592F0C [libpthreads.a+0x25f0c])
4XENATIVESTACK pthread_cond_wait+0x19c (0x0900000000593B60 [libpthreads.a+0x26b60])
4XENATIVESTACK IPRA.$monitor_wait_original+0x9b0 (0x0900000012442D34 [libj9thr29.so+0x9d34])
4XENATIVESTACK omrthread_monitor_wait+0x40 (0x0900000012442104 [libj9thr29.so+0x9104])
4XENATIVESTACK (0x0900000012C55794 [libj9jit29.so+0x1f9794])
4XENATIVESTACK (0x090000001338E43C [libj9jit29.so+0x93243c])
4XENATIVESTACK (0x090000001338D5A0 [libj9jit29.so+0x9315a0])
4XENATIVESTACK exitJavaVM+0x330 (0x09000000126424F4 [libj9vm29.so+0x84f4])
4XENATIVESTACK JVM_Halt_Impl+0xbc (0x09000000145E2CA0 [libjclse29.so+0x92ca0])
4XENATIVESTACK JVM_Halt+0x44 (0x09000000125E8BE8 [libjvm.so+0xfbe8])
4XENATIVESTACK JVM_Halt+0xc4 (0x09000000125A73C8 [libjvm.so+0x63c8])
4XENATIVESTACK Java_java_lang_Shutdown_halt0+0x10 (0x09000000146273F4 [libjava.so+0x143f4])
4XENATIVESTACK (0x090000001280D63C [libj9vm29.so+0x1d363c])
4XENATIVESTACK ffi_call+0x98 (0x090000001280C91C [libj9vm29.so+0x1d291c])
4XENATIVESTACK (0x0900000012811FE4 [libj9vm29.so+0x1d7fe4])
4XENATIVESTACK (0x09000000126AA27C [libj9vm29.so+0x7027c])
4XENATIVESTACK runCallInMethod+0x264 (0x09000000126A6D48 [libj9vm29.so+0x6cd48])
4XENATIVESTACK gpProtectedRunCallInMethod__FPv+0x48 (0x090000001269744C [libj9vm29.so+0x5d44c])
4XENATIVESTACK signalProtectAndRunGlue+0x28 (0x09000000126910AC [libj9vm29.so+0x570ac])
4XENATIVESTACK omrsig_protect+0x488 (0x0900000012925BCC [libj9prt29.so+0x59bcc])
4XENATIVESTACK gpProtectAndRun+0xf8 (0x09000000126911FC [libj9vm29.so+0x571fc])
4XENATIVESTACK gpCheckCallin+0x114 (0x09000000126973B8 [libj9vm29.so+0x5d3b8])
4XENATIVESTACK callStaticVoidMethod+0x48 (0x09000000126F130C [libj9vm29.so+0xb730c])
4XENATIVESTACK (0x0000010000004838 [java+0x4838])
4XENATIVESTACK _pthread_body+0xf0 (0x0900000000570E14 [libpthreads.a+0x3e14])
I was looking at the problem in: https://github.com/eclipse/openj9/issues/11526#issuecomment-763320863
I downloaded the build from: https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK8_ppc64_aix_Release/60/OpenJ9-JDK8-ppc64_aix-20210119-160857.tar.gz
Version output:
java -Xshareclasses:none -Xnoaot -version
openjdk version "1.8.0_282-internal"
OpenJDK Runtime Environment (build 1.8.0_282-internal-jenkins_2021_01_19_16_11-b00)
Eclipse OpenJ9 VM (build v0.24.0-release-345e1b09e2a, JRE 1.8.0 AIX ppc64-64-Bit Compressed References 20210119_60 (JIT enabled, AOT disabled)
OpenJ9 - 345e1b09e2a
OMR - 741e94ea867
JCL - 4989cf9d146 based on jdk8u282-b08)
However, I ran into an issue where it looks like the symbols have largely been stripped from libj9jit29.so
. This is based both off trying to view backtrace information in a debugger and also using the dump -X any -c -T libj9jit29.so
command to verify that symbols can not be found. As a result, it would be challenging to figure out what is going on from the information on hand.
Is it normal for the symbol information to be stripped? If possible, it would be helpful to get another reproduction of the failure with a build that does have symbol information. I am currently trying to see if I can do this myself.
Downloading the debug image and overlaying it may help. https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Build_JDK8_ppc64_aix_Release/60/debug-image.tar.gz
On AIX the debug image contains the non-stripped libraries.
Overlaying the debug image looks like it helped. Thanks!
So the backtrace in the corefile for JIT Compilation Thread-003
looks like this:
(kca) where
0x0900000012f72594 {libj9jit29.so}{addProfileInfo__18TR_JProfilerThreadFP24TR_PersistentProfileInfo} [0x10020be18d0]
0x0900000012f72308 {libj9jit29.so}{findOrCreateProfileInfo__Q2_2J913RecompilationFv} [0x10020be18d0]
0x09000000132e6110 {libj9jit29.so}{beforeOptimization__Q2_2J913RecompilationFv} [0x10020be19a0]
0x0900000012e18228 {libj9jit29.so}{compile__Q2_3OMR11CompilationFv} [0x10020be1a60]
0x0900000012e15f24 {libj9jit29.so}{compile__Q2_2TR28CompilationInfoPerThreadBaseFP10J9VMThreadPQ2_2TR11CompilationP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRCQ2_2TR16SegmentAllocator} [0x10020be3370]
0x0900000012c78728 {libj9jit29.so}{wrappedCompile__Q2_2TR28CompilationInfoPerThreadBaseFP13J9PortLibraryPv} [0x10020be3550]
0x0900000012925bcc {libj9prt29.so}{omrsig_protect} [0x10020be3760]
0x0900000012c5c5b8 {libj9jit29.so}{compile__Q2_2TR28CompilationInfoPerThreadBaseFP10J9VMThreadP21TR_MethodToBeCompiledRQ2_2J917J9SegmentProvider} [0x10020be3b30]
0x0900000013389610 {libj9jit29.so}{processEntry__Q2_2TR24CompilationInfoPerThreadFR21TR_MethodToBeCompiledRQ2_2J917J9SegmentProvider} [0x10020be4f00]
0x090000001338bb4c {libj9jit29.so}{processEntries__Q2_2TR24CompilationInfoPerThreadFv} [0x10020be4fd0]
0x090000001338ae18 {libj9jit29.so}{protectedCompilationThreadProc__FP13J9PortLibraryPQ2_2TR24CompilationInfoPerThread} [0x10020be5130]
0x0900000012925bcc {libj9prt29.so}{omrsig_protect} [0x10020be51f0]
0x090000001338a5f4 {libj9jit29.so}{compilationThreadProc__FPv} [0x10020be55c0]
0x090000001243d808 {libj9thr29.so}{thread_wrapper} [0x10020be5670]
0x0900000000570e14 {libpthreads.a}{} [0x10020be5790]
I think the following two lines in side the do
block are getting pulled out of the loop when they shouldn't:
oldHead = _listHead;
oldPtr = (uintptr_t)oldHead;
The method starts with:
0x900000012f72580 {libj9jit29.so}{addProfileInfo...} +0 e8030000 ld r0, 0(r3) // oldPtr (r0) = _listHead (0(r3))
0x900000012f72584 {libj9jit29.so}{addProfileInfo...} +1 60000000 nop
0x900000012f72588 {libj9jit29.so}{addProfileInfo...} +2 60000000 nop
0x900000012f7258c {libj9jit29.so}{addProfileInfo...} +3 60210000 ori r1, r1, 0
The hang is inside this sequence of assembly:
//Initial values. They don't change inside the small loop.
// $r0 = 0x10023ee8be0 (uintptr_t oldPtr)
// $r3 = 0x1001070fa30 (TR_JProfilerThread receiver object)
// $r4 = 0x10023eed740 (param1 : TR_PersistentProfileInfo *newHead)
0x900000012f72590 {libj9jit29.so}{addProfileInfo...} +4 > f8040000 std r0, 0(r4) <<< ^+11 //In the core file 0(r4) has 0x10023ee8be0 as expected
0x900000012f72594 {libj9jit29.so}{addProfileInfo...} +5 |> 7ca018a8 ldarx r5, 0, r3 <<< ^+9 //$r5 = 0x10027233220, read from _listHead
0x900000012f72598 {libj9jit29.so}{addProfileInfo...} +6 || 7c202840 cmpld r0, r5 //compare $r5 to oldPtr ($r0), they are not the same. ($r0 = 0x10023ee8be0)
0x900000012f7259c {libj9jit29.so}{addProfileInfo...} +7 ||* 4082000c bne 0x900000012f725a8 C>> +10 //r5 is not equal to r0 so branch
0x900000012f725a8 {libj9jit29.so}{addProfileInfo...} +10 | > 7c250040 cmpld r5, r0 <<< +7 //compare $r5 to $r0 again. They still are not the same.
0x900000012f725ac {libj9jit29.so}{addProfileInfo...} +11 * 4082ffe4 bne 0x900000012f72590 C>> ^+4 //r5 is not equal to r0 so branch back up
Each loop iteration is supposed to update oldPtr
but it doesn't. $r0 never changes so the loop goes on forever.
It looks like the problem is here: https://github.com/eclipse/openj9/blob/345e1b09e2a1f2cf6323b25edc901cce197f4365/runtime/compiler/runtime/J9Profiler.hpp#L865
_listHead
can be updated by multiple threads so I think it needs to be marked as volatile
.
I was able to compile a build of my own with the problem where reading from _listHead
was pulled out of the loop. I then tried chaning _listHead
to be TR_PersistentProfileInfo * volatile
and fixed up a few different code locations to deal with casting issues this caused. A build with this change no longer pulls the load out of the loop which fixes the issue.
I am currently going over nearby code to make sure they use volatile
correctly. If no other problems arise, I should be able to make a PR soon.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly_testList_2/57 DaaLoadTest_daa2_special_5m_16
https://github.com/eclipse/openj9/issues/11526#issuecomment-770213629
That failing test didn't create a corefile but it did come with a javacore. It looks to be the same issue. There's a hang in a compilation thread. Based off the offset into libj9jit29.so
it is also in a similar location.
I've opened a PR that should fix this Issue. _listHead
now gets marked as being volatile
and will no longer be pulled out of the loop.
@IBMJimmyk Any new updates on this one?
The PR to fix the issue is open. I'm currently waiting on the changes to be approved and merged in.
This was set as a blocker due to the frequency of failure, however the failure hasn't been seen in some time so I'll remove the blocker status.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Release_testList_4/2 DaaLoadTest_daa2_special_22 aix71-p8-3
variation: Mode614 JVM_OPTIONS: -Xcompressedrefs -Xgcpolicy:gencon -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xmn512k -Xcheck:gc:vmthreads:all:quiet
Diagnostics: https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Test/Test_openjdk8_j9_special.system_ppc64_aix_Release_testList_4/2/system_test_output.tar.gz
javacore shows: