eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 722 forks source link

JDK21 serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java FAILED GetThreadStateTest::testRunnable & timedout #18763

Open JasonFengJ9 opened 10 months ago

JasonFengJ9 commented 10 months ago

Failure link

From an internal build(rhel8s390x-svl-rt7-1):

java version "21.0.1-beta" 2023-10-17
IBM Semeru Runtime Certified Edition 21.0.1+12-202401131605 (build 21.0.1-beta+12-202401131605)
Eclipse OpenJ9 VM 21.0.1+12-202401131605 (build master-ed3e66243, JRE 21 Linux s390x-64-Bit Compressed References 20240113_79 (JIT enabled, AOT enabled)
OpenJ9   - ed3e66243
OMR      - 3e1bb6994
JCL      - 6b83790e0 based on jdk-21.0.1+12)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

[2024-01-13T17:12:26.921Z] variation: Mode150
[2024-01-13T17:12:26.921Z] JVM_OPTIONS:  -XX:+UseCompressedOops -Xverbosegclog 

[2024-01-13T17:27:29.699Z] TEST: serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java#no-vmcontinuations

[2024-01-13T17:27:29.700Z] STARTED    GetThreadStateTest::testRunnable 'testRunnable()'
[2024-01-13T17:27:29.700Z]   expect state=0x5 (JVMTI_THREAD_STATE_ALIVE | JVMTI_THREAD_STATE_RUNNABLE) ...
[2024-01-13T17:27:29.700Z]   thread state=0x400005 (JVMTI_THREAD_STATE_ALIVE | JVMTI_THREAD_STATE_RUNNABLE | JVMTI_THREAD_STATE_IN_NATIVE)
[2024-01-13T17:27:29.700Z] org.opentest4j.AssertionFailedError: expected: <5> but was: <4194309>
[2024-01-13T17:27:29.700Z]  at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
[2024-01-13T17:27:29.700Z]  at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
[2024-01-13T17:27:29.700Z]  at org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
[2024-01-13T17:27:29.700Z]  at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
[2024-01-13T17:27:29.700Z]  at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
[2024-01-13T17:27:29.700Z]  at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:528)
[2024-01-13T17:27:29.700Z]  at GetThreadStateTest.check(GetThreadStateTest.java:352)
[2024-01-13T17:27:29.700Z]  at GetThreadStateTest.testRunnable(GetThreadStateTest.java:94)

[2024-01-13T17:27:29.700Z] FAILED     GetThreadStateTest::testRunnable 'testRunnable()'

[2024-01-13T17:27:29.700Z] java.lang.Exception: JUnit test failure
[2024-01-13T17:27:29.700Z]  at com.sun.javatest.regtest.agent.JUnitRunner.runWithJUnitPlatform(JUnitRunner.java:149)
[2024-01-13T17:27:29.700Z]  at com.sun.javatest.regtest.agent.JUnitRunner.main(JUnitRunner.java:95)
[2024-01-13T17:27:29.700Z]  at com.sun.javatest.regtest.agent.JUnitRunner.main(JUnitRunner.java:61)
[2024-01-13T17:27:29.700Z]  at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
[2024-01-13T17:27:29.700Z]  at java.base/java.lang.reflect.Method.invoke(Method.java:586)
[2024-01-13T17:27:29.700Z]  at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
[2024-01-13T17:27:29.700Z]  at java.base/java.lang.Thread.run(Thread.java:1595)
[2024-01-13T17:27:29.700Z] 
[2024-01-13T17:27:29.701Z] JavaTest Message: Test threw exception: java.lang.Exception: JUnit test failure

[2024-01-13T17:43:31.006Z] TEST: serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java#default

[2024-01-13T17:43:31.015Z] TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_s390x_linux/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 972972ms).
[2024-01-13T17:43:31.015Z] --------------------------------------------------
[2024-01-13T17:43:31.015Z] Test results: passed: 154; failed: 1; error: 1
[2024-01-13T17:43:40.301Z] Report written to /home/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_s390x_linux/jvmtest/openjdk/report/html/report.html
[2024-01-13T17:43:40.301Z] Results written to /home/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_s390x_linux/aqa-tests/TKG/output_17051659438546/serviceability_jvmti_j9_0/work
[2024-01-13T17:43:40.301Z] Error: Some tests failed or other problems occurred.
[2024-01-13T17:43:40.301Z] -----------------------------------
[2024-01-13T17:43:40.301Z] serviceability_jvmti_j9_0_FAILED

50x internal grinder - FAILED GetThreadStateTest::testRunnable wasn't reproduced, 5 timedout for no-vmcontinuations, and 3 for default.

pshipton commented 10 months ago

5 failures for no-vmcontinuations, and 3 for default. Plus some failures in serviceability/jvmti/vthread/SuspendResume2/SuspendResume2.java for which we have https://github.com/eclipse-openj9/openj9/issues/18735 https://github.com/eclipse-openj9/openj9/issues/18675

JasonFengJ9 commented 9 months ago

JDK22 x86-64_linux

19:35:11  variation: Mode150
19:35:11  JVM_OPTIONS:  -XX:+UseCompressedOops -Xverbosegclog 

19:56:33  TEST: serviceability/jvmti/vthread/SuspendResume2/SuspendResume2.java#default

19:56:33  TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk22_j9_extended.openjdk_x86-64_linux_Personal/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 965304ms).
19:56:33  --------------------------------------------------
19:56:33  Test results: passed: 157; error: 1
19:56:39  Report written to /home/jenkins/workspace/Test_openjdk22_j9_extended.openjdk_x86-64_linux_Personal/jvmtest/openjdk/report/html/report.html
19:56:39  Results written to /home/jenkins/workspace/Test_openjdk22_j9_extended.openjdk_x86-64_linux_Personal/aqa-tests/TKG/output_17086485097654/serviceability_jvmti_j9_0/work
19:56:39  Error: Some tests failed or other problems occurred.
19:56:39  -----------------------------------
19:56:39  serviceability_jvmti_j9_0_FAILED
JasonFengJ9 commented 8 months ago

JDK21 s390x_linux(sles15s390x-svl-rt9-1)

[2024-03-20T02:50:10.223Z] variation: Mode650
[2024-03-20T02:50:10.223Z] JVM_OPTIONS:  -XX:-UseCompressedOops -Xverbosegclog 

[2024-03-20T03:10:03.646Z] TEST: serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java#default

[2024-03-20T03:10:03.653Z] TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_s390x_linux_testList_1/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 963260ms).
[2024-03-20T03:10:03.653Z] --------------------------------------------------
[2024-03-20T03:10:03.653Z] Test results: passed: 155; error: 1
[2024-03-20T03:10:07.671Z] Report written to /home/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_s390x_linux_testList_1/jvmtest/openjdk/report/html/report.html
[2024-03-20T03:10:07.671Z] Results written to /home/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_s390x_linux_testList_1/aqa-tests/TKG/output_17109030099715/serviceability_jvmti_j9_1/work
[2024-03-20T03:10:07.671Z] Error: Some tests failed or other problems occurred.
[2024-03-20T03:10:07.671Z] -----------------------------------
[2024-03-20T03:10:07.671Z] serviceability_jvmti_j9_1_FAILED
babsingh commented 5 months ago

The original failure was fixed when the new Thread state changes were supported in March 2024.

The timeout failure is intermittent and it happens because there is the IN_NATIVE bit set in the thread state. This is similar to the IN_NATIVE issue reported in https://github.com/eclipse-openj9/openj9/issues/19203#issuecomment-2137663450. A pinned virtual thread is being parked; all expected bits are set in the thread state, except the IN_NATIVE is also set. Having the IN_NATIVE bit in the thread state is considered valid as per the JVMTI spec. The test waits indefinitely until the expected thread state without the IN_NATIVE bit is achieved; this leads to a timeout. The test needs to account for the IN_NATIVE bit to avoid the timeout.

Such issues are also seen with the RI and are being tracked by them; see https://bugs.openjdk.org/browse/JDK-8318090.

14:25:43  STARTED    GetThreadStateTest::testParkWhenPinned 'testParkWhenPinned()'
14:25:43    await state=0x291 (JVMTI_THREAD_STATE_ALIVE | JVMTI_THREAD_STATE_WAITING | JVMTI_THREAD_STATE_WAITING_INDEFINITELY | JVMTI_THREAD_STATE_PARKED) ...
14:25:43    thread state=0x400291 (JVMTI_THREAD_STATE_ALIVE | JVMTI_THREAD_STATE_WAITING | JVMTI_THREAD_STATE_WAITING_INDEFINITELY | JVMTI_THREAD_STATE_PARKED | JVMTI_THREAD_STATE_IN_NATIVE)

Since this is a test issue, it doesn't block the current release. Moving it to the 0.48 release. The RI plans to resolve this test issue by JDK24.

pshipton commented 4 months ago

https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk21_j9_extended.openjdk_ppc64le_linux_testList_3/8/ - ubu20le-rtp-rt1-1 https://trssrtp1.fyre.ibm.com/output/test?id=668ad2e66b34225c2bd84a21 serviceability/jvmti/vthread/SuspendResume2/SuspendResume2.java#no-vmcontinuations timeout

https://trssrtp1.fyre.ibm.com/output/test?id=668ae0d66b34225c2bd9e5ec https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk22_j9_extended.openjdk_s390x_linux_testList_2/67/ - rtj-ubu24s390x-svl-test-10ukx-1 serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java#default timeout

babsingh commented 4 months ago

re https://github.com/eclipse-openj9/openj9/issues/18763#issuecomment-2218759739:

GetThreadStateTest failure is identical to the IN_NATIVE failure mentioned in https://github.com/eclipse-openj9/openj9/issues/18763#issuecomment-2142491403; it is a test issue and the failure is also seen with the RI.

SuspendResume2 failure is a different timeout, and it might be related to https://github.com/eclipse-openj9/openj9/issues/19838 and https://github.com/eclipse-openj9/openj9/issues/18738#issuecomment-2218824062. @fengxue-IS Is this related to the timing gap mentioned in https://github.com/eclipse-openj9/openj9/issues/18738#issuecomment-2218824062?

fengxue-IS commented 4 months ago

SuspendResume2 failure is a different timeout, and it might be related to #19838 and #18738 (comment). @fengxue-IS Is this related to the timing gap mentioned in #18738 (comment)?

The timing gap should only allow terminated vthread to be returned even when error on dead thread flag is set which may lead to undefined behaviour, but I don't think this should cause a timeout. I will have the PR to resolve the timeout ready today, can re-test this to confirm.

JasonFengJ9 commented 3 months ago

JDK23 Milestone 1 x86-64_windows(win11x86-rtp-rt5-1)

[2024-08-14T13:45:07.108Z] variation: Mode150
[2024-08-14T13:45:07.108Z] JVM_OPTIONS:  -XX:+UseCompressedOops -Xverbosegclog 

[2024-08-14T14:37:21.013Z] TEST: serviceability/jvmti/vthread/SuspendResume2/SuspendResume2.java#no-vmcontinuations

[2024-08-14T14:37:21.018Z] Timeout signalled after 960 seconds
[2024-08-14T14:37:21.018Z] STDERR:

[2024-08-14T14:37:21.019Z] TEST: serviceability/jvmti/vthread/SuspendResumeAll/SuspendResumeAll.java#default

[2024-08-14T14:37:21.030Z] Timeout signalled after 960 seconds
[2024-08-14T14:37:21.030Z] STDERR:

[2024-08-14T14:37:21.031Z] TEST RESULT: Error. Program `C:\Users\jenkins\workspace\Test_openjdk23_j9_extended.openjdk_x86-64_windows_testList_1\jdkbinary\j2sdk-image\bin\java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 1010015ms).
[2024-08-14T14:37:21.031Z] --------------------------------------------------
[2024-08-14T14:37:21.031Z] Test results: passed: 159; error: 2
[2024-08-14T14:38:10.892Z] Report written to C:\Users\jenkins\workspace\Test_openjdk23_j9_extended.openjdk_x86-64_windows_testList_1\jvmtest\openjdk\report\html\report.html
[2024-08-14T14:38:10.892Z] Results written to C:\Users\jenkins\workspace\Test_openjdk23_j9_extended.openjdk_x86-64_windows_testList_1\aqa-tests\TKG\output_17236430686175\serviceability_jvmti_j9_0\work
[2024-08-14T14:38:10.892Z] Error: Some tests failed or other problems occurred.
[2024-08-14T14:38:10.893Z] -----------------------------------
[2024-08-14T14:38:10.893Z] serviceability_jvmti_j9_0_FAILED

50x grinder - passed

tajila commented 2 months ago

@fengxue-IS Whats the latest on this?

fengxue-IS commented 2 months ago

@fengxue-IS Whats the latest on this?

Currently its in the same state as @babsingh mentioned in https://github.com/eclipse-openj9/openj9/issues/18763#issuecomment-2220779223

GetThreadStateTest still not fix in RI yet, target is still JDK24

For SuspendResume2 I wasn't able to reproduce this failure on 100x grinder. Will launch another grinder to verify this

babsingh commented 2 months ago

Moving this issue to the Java 24 milestone in order to match the RI's release target.

JasonFengJ9 commented 2 months ago

JDK21 s390x_linux 0.46.1(sles12s390x-svl-rt1-1)

java version "21.0.4" 2024-07-16 LTS
IBM Semeru Runtime Certified Edition 21.0.4.1-rc1 (build 21.0.4+7-LTS)
Eclipse OpenJ9 VM 21.0.4.1-rc1 (build v0.46.1-release-4760d5d320, JRE 21 Linux s390x-64-Bit Compressed References 20240716_266 (JIT enabled, AOT enabled)
OpenJ9   - 4760d5d320
OMR      - 840a9adba
JCL      - 6232a5d28a1 based on jdk-21.0.4+7)

[2024-09-05T15:41:12.930Z] variation: Mode150
[2024-09-05T15:41:12.930Z] JVM_OPTIONS:  -XX:+UseCompressedOops -Xverbosegclog 

[2024-09-05T16:04:26.005Z] TEST: serviceability/jvmti/vthread/SuspendResume2/SuspendResume2.java#default

[2024-09-05T16:04:26.007Z] ## Agent: test_vthread_suspend_all started
[2024-09-05T16:04:26.007Z] Timeout signalled after 960 seconds

[2024-09-05T16:04:26.007Z] TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_s390x_linux_testList_1/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 968440ms).
[2024-09-05T16:04:26.007Z] --------------------------------------------------
[2024-09-05T16:04:26.007Z] Test results: passed: 155; error: 1
[2024-09-05T16:04:36.073Z] Report written to /home/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_s390x_linux_testList_1/jvmtest/openjdk/report/html/report.html
[2024-09-05T16:04:36.073Z] Results written to /home/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_s390x_linux_testList_1/aqa-tests/TKG/output_17255508714747/serviceability_jvmti_j9_0/work
[2024-09-05T16:04:36.073Z] Error: Some tests failed or other problems occurred.
[2024-09-05T16:04:36.073Z] -----------------------------------
[2024-09-05T16:04:36.073Z] serviceability_jvmti_j9_0_FAILED
babsingh commented 2 months ago

fyi @fengxue-IS, Moving it back to the 0.48 release due to the recent SuspendResume2 failure on JDK21 reported by @JasonFengJ9.

JasonFengJ9 commented 2 months ago

JDK22 aarch64_linux 0.46.1 (sles15-aarch64-6)

openjdk version "22.0.2" 2024-07-16
IBM Semeru Runtime Open Edition 22.0.2.1-rc1 (build 22.0.2+9)
Eclipse OpenJ9 VM 22.0.2.1-rc1 (build v0.46.1-release-4760d5d320, JRE 22 Linux aarch64-64-Bit Compressed References 20240716_33 (JIT enabled, AOT enabled)
OpenJ9   - 4760d5d320
OMR      - 840a9adba
JCL      - e0ddd8cb412 based on jdk-22.0.2+9)

[2024-09-06T22:29:27.385Z] variation: Mode650
[2024-09-06T22:29:27.385Z] JVM_OPTIONS:  -XX:-UseCompressedOops -Xverbosegclog 

[2024-09-06T22:49:14.038Z] TEST: serviceability/jvmti/vthread/SuspendResume2/SuspendResume2.java#no-vmcontinuations

[2024-09-06T22:49:14.041Z] TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk22_j9_extended.openjdk_aarch64_linux_testList_1/jdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 967452ms).
[2024-09-06T22:49:14.041Z] --------------------------------------------------
[2024-09-06T22:49:14.041Z] Test results: passed: 158; error: 1
[2024-09-06T22:49:14.041Z] Report written to /home/jenkins/workspace/Test_openjdk22_j9_extended.openjdk_aarch64_linux_testList_1/jvmtest/openjdk/report/html/report.html
[2024-09-06T22:49:14.041Z] Results written to /home/jenkins/workspace/Test_openjdk22_j9_extended.openjdk_aarch64_linux_testList_1/aqa-tests/TKG/output_17256617662955/serviceability_jvmti_j9_1/work
[2024-09-06T22:49:14.041Z] Error: Some tests failed or other problems occurred.
[2024-09-06T22:49:14.041Z] -----------------------------------
[2024-09-06T22:49:14.041Z] serviceability_jvmti_j9_1_FAILED

50x grinder - 2/50 failed

babsingh commented 2 months ago

@fengxue-IS For 0.48, this issue will need to be resolved by the end of this week. What's the current state of this issue? Based on this issue's impact, do we need it to be fixed in 0.48 or can it be pushed to 0.49?

fengxue-IS commented 2 months ago

@fengxue-IS For 0.48, this issue will need to be resolved by the end of this week. What's the current state of this issue? Based on this issue's impact, do we need it to be fixed in 0.48 or can it be pushed to 0.49?

I think this should be deferred to 0.49, GetThreadState still waiting for RI to fix, SuspendResume is difficult to reproduce (unable to reproduce the timeout with unit testing)