JDK21 serviceability_jvmti_j9_0_FAILED serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java timed out

JasonFengJ9 commented 9 months ago

Failure link

Created from https://github.com/eclipse-openj9/openj9/issues/18675#issuecomment-1880983707 as per https://github.com/eclipse-openj9/openj9/issues/18675#issuecomment-1881681970

From an internal build(win19x86-svl-rt4-1):

13:30:36  openjdk version "21.0.1" 2023-10-17 LTS
13:30:36  IBM Semeru Runtime Open Edition 21.0.1.0-m3 (build 21.0.1+12-LTS)
13:30:36  Eclipse OpenJ9 VM 21.0.1.0-m3 (build v0.42.0-release-69b6ceb69, JRE 21 Windows Server 2019 amd64-64-Bit Compressed References 20231017_74 (JIT enabled, AOT enabled)
13:30:36  OpenJ9   - 69b6ceb69
13:30:36  OMR      - 494d6eb66
13:30:36  JCL      - 5846adac994 based on jdk-21.0.1+12)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)


13:40:34  ===============================================
13:40:35  Running test serviceability_jvmti_j9_0 ...
13:40:35  ===============================================
13:40:36  serviceability_jvmti_j9_0 Start Time: Sun Jan  7 10:40:35 2024 Epoch Time (ms): 1704652835766
13:40:36  variation: Mode150
13:40:36  JVM_OPTIONS:  -XX:+UseCompressedOops -Xverbosegclog 

14:32:36  TEST: serviceability/jvmti/vthread/VThreadEventTest/VThreadEventTest.java

14:32:36  TEST RESULT: Error. Program `C:\Users\jenkins\workspace\Test_openjdk21_j9_extended.openjdk_x86-64_windows_testList_5\jdkbinary\j2sdk-image\bin\java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 1199454ms).
14:32:36  --------------------------------------------------
14:32:36  Test results: passed: 155; error: 1

14:33:05  -----------------------------------
14:33:05  serviceability_jvmti_j9_0_FAILED

50x grinder - 37/50 failed

babsingh commented 9 months ago

@fengxue-IS Can you take a look at this failure? This is targeted for the 0.43 Jan release. I am looking at another 0.43 failure: #18675. I won't be able to look at both of them. I have included my current findings below.

If you run the grinder with the changes from https://github.com/babsingh/aqa-tests/commits/debug_18712, then only VThreadEventTest will be run for serviceability_jvmti_j9_0 (TARGET). The timeout framework only outputs the Java stack trace. @llxia Is there a test option to get a system core file for the timeout? In this case, I don't see a way to use -Xdump to get the system core file.

The timeout failure is only seen on Windows. On Linux x64, the test passes all the time.

The hang happens in VThreadEventTest.java#L175-L179:

        for (int sleepNo = 1; threadEndCount() < THREAD_CNT; sleepNo++) {
            Thread.sleep(100);
            if (sleepNo % 100 == 0) { // 10 sec period of waiting
                log("main: waited seconds: " + sleepNo/10);
            }
        }

The test enables the following JVMTI events: VirtualThreadEnd, VirtualThreadMount, and VirtualThreadUnmount, which are triggered from the virtualThread* methods in javanextvmi.cpp. Then, it launches a number of virtual threads, and expects the following number of events:

VirtualThreadEnd cnt: 18 (expected: 18)
VirtualThreadMount cnt: 14 (expected: 14)
VirtualThreadUnmount cnt: 22 (expected: 22)

The hang happens in the above loop because 18 VirtualThreadEnd events don't occur.

The RI also experiences a similar issue: https://bugs.openjdk.org/browse/JDK-8322206. In their case, the count for VirtualThreadMount and VirtualThreadUnmount are off by 1.

In JVM_VirtualThreadEnd, we unconditionally trigger the VirtualThreadEnd event hook. So, this can be either related to how JVM_VirtualThreadEnd is invoked from the VirtualThread class or a test issue, in which case, we can either fix or disable the test. For the former scenario, we use RI's VirtualThread class so the problem might also exist in the RI.

fyi @tajila @pshipton