eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

LambdaLoadTest hang? in Runtime.gc() #20559

Open pshipton opened 1 week ago

pshipton commented 1 week ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_special.system_ppc64_aix_Personal_testList_2/134 - p8-java1-ibm07 LambdaLoadTest_special_J9_5m_8 -Xgcpolicy:gencon -Xshareclasses -Xjit -Xnocompressedrefs

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk8_j9_special.system_ppc64_aix_Personal_testList_2/134/system_test_output.tar.gz **FAILED** Process LT has timed out

There are multiple load-* threads waiting in Runtime.gc().

    Java stack frames:
     bp: 0x00000100266d4ed0  method: void java/lang/Runtime.gc()  (Native Method)
      objects: 0xa00000000035850
     bp: 0x00000100266d4f98  method: void java/lang/System.gc()  source: System.java:394
      objects: <no objects in this frame>
     bp: 0x00000100266d5000  method: void net/adoptopenjdk/test/lambda/TestLambdaCapture.testCaptureReturnedLambda()  source: TestLambdaCapture.java:135
      objects: 0xa000001d81d38d0 0xa000000004268d8 0xa000001d81d3858 0xa000001d81d37a8
pshipton commented 1 week ago

@dmitripivkine

github-actions[bot] commented 1 week ago

Issue Number: 20559 Status: Open Recommended Components: comp:gc, comp:vm, comp:test Recommended Assignees: pshipton, jasonfengj9, llxia

dmitripivkine commented 1 week ago

This thread got request to execute SystemGC and trying to acquire Exclusive Access:

!stack 0x1002505b100    !j9vmthread 0x1002505b100   !j9thread 0x1002504b888 tid 0x980291 (9962129) // (load-4)
!j9vmthread 0x1002505b100 publicFlags=1020 privateFlags=0 inNative=0 // load-4

16:10:21.075646309 *0x1002505B100 j9vm.368             Event       Posted Halt request to 1 threads.
16:10:21.075647719  0x1002505B100 j9vm.369             Event       Waiting for 1 thread halt responses.
17:15:05.042548194 *0x100224E2A00 j9vm.243             Exit       <J9SignalAsyncEvent returning 0
17:15:05.042548489  0x100224E2A00 j9vm.242             Entry      >J9SignalAsyncEvent targetThread=0x10025099a00 handlerKey=2
17:15:05.042548731  0x100224E2A00 j9vm.243             Exit       <J9SignalAsyncEvent returning 0

However there is another thread has set Halt flag (0x01) but still keeping VM Access (0x20) and prevents taking Exclusive:

    !stack 0x10025099a00    !j9vmthread 0x10025099a00   !j9thread 0x100250898c8 tid 0x32402cf (52691663) // (load-10)
    !j9vmthread 0x10025099a00 publicFlags=21 privateFlags=0 inNative=0 // load-10

Looking to this threads is dbx:

$t47    run                          running  52691663   u   no   sys
(dbx) thread current 47
(dbx) where
---------
  $r0:0x090000000481148c  $stkp:0x00000100259009b0   $toc:0x08001000a01ca050
  $r3:0x0000010025099a00    $r4:0x000001001218ffcc    $r5:0x00000100121e2ea8
  $r6:0x0000000000000000    $r7:0x0000000000000000    $r8:0x0a000001d90a2778
  $r9:0x00000000004d2a10   $r10:0x0900000004461ca4   $r11:0x0000010022061aa8
 $r12:0x000001001218ffbc   $r13:0x000001002590a800   $r14:0x0000010025aa87b8
 $r15:0x0000010025099a00   $r16:0x0000010021fcb658   $r17:0xffffffffffffffff
 $r18:0x0000010025900bf0   $r19:0x420000842509b678   $r20:0x09000000056b62bc
 $r21:0x00000100224113e0   $r22:0x0000010025900c00   $r23:0x0000000000000001
 $r24:0x09000000056c0168   $r25:0x09001000a0760b90   $r26:0x0a000000004d2a10
 $r27:0x0a000001d90a2748   $r28:0x0a000001d90a2748   $r29:0x0a000000004d2a10
 $r30:0x0a0000000026e860   $r31:0x0a000001d90a2778
 $iar:0x090000000448d100   $msr:0x800000000000d032    $cr:0x0000000044000280
$link:0x0900000004461d60   $ctr:0x0900000004461ca4   $xer:0x0000000020000010
          Condition status = 0:g 1:g 5:e 6:l
        [unset $noflregs to view floating point registers]
        [unset $novregs to view vector registers]
        [unset $novsregs to view vector scalar registers]

0 old_slow_jitRetranslateMethod(0x10025099a00) at 0x90000000448d100

So, looks like thread in old_slow_jitRetranslateMethod() did not drop VM Access and caused the hang. @hzongaro FYI

gacholio commented 1 week ago

I have not looked at the code, but the usual cause of issues like this is a monitor - specifically that VM access should be treated as a monitor enter for ordering purposes.