eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.29k stars 723 forks source link

cmdLineTester_criu_nonPortableRestore_4_FAILED FAILED: testMXBeanUpTime() - uptimeAfterCheckpoint 9437 can't be greater than uptimeBeforeCheckpoint 7751 + 1500 #18384

Closed JasonFengJ9 closed 1 year ago

JasonFengJ9 commented 1 year ago

Failure link

From an internal build(ubu22s390x-svl-rt6-1):

java version "21.0.1-beta" 2023-10-17
IBM Semeru Runtime Certified Edition 21.0.1+12-202310310039 (build 21.0.1-beta+12-202310310039)
Eclipse OpenJ9 VM 21.0.1+12-202310310039 (build master-a80c01fea, JRE 21 Linux s390x-64-Bit Compressed References 20231031_25 (JIT enabled, AOT enabled)
OpenJ9   - a80c01fea
OMR      - 94e44382f
JCL      - c06eaf638 based on jdk-21.0.1+12)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

[2023-10-31T02:08:19.555Z] variation: -Xjit:count=0 -XX:+CRIURestoreNonPortableMode
[2023-10-31T02:08:19.555Z] JVM_OPTIONS:  -Xjit:count=0 -XX:+CRIURestoreNonPortableMode 

[2023-10-31T02:11:12.649Z] Testing: Create CRIU checkpoint image and restore once - testMXBeanUpTime
[2023-10-31T02:11:12.649Z] Test start time: 2023/10/31 02:11:12 Coordinated Universal Time
[2023-10-31T02:11:12.649Z] Running command: bash /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/openjdkbinary/j2sdk-image/bin/java " -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu,j9jcl.533} --add-exports java.base/openj9.internal.criu=ALL-UNNAMED" org.openj9.criu.TimeChangeTest testMXBeanUpTime 1 false false
[2023-10-31T02:11:12.649Z] Time spent starting: 6 milliseconds
[2023-10-31T02:11:24.307Z] Time spent executing: 12248 milliseconds
[2023-10-31T02:11:24.307Z] Test result: FAILED
[2023-10-31T02:11:24.307Z] Output from test:
[2023-10-31T02:11:24.307Z]  [OUT] start running script
[2023-10-31T02:11:24.307Z]  [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
[2023-10-31T02:11:24.307Z]  [OUT] export LD_BIND_NOT=on
[2023-10-31T02:11:24.307Z]  [OUT] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/openjdkbinary/j2sdk-image/bin/java -XX:+EnableCRIUSupport  -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu,j9jcl.533} --add-exports java.base/openj9.internal.criu=ALL-UNNAMED -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criu.jar org.openj9.criu.TimeChangeTest testMXBeanUpTime 1
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:19.881*0xa55800           j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698718279871) vmStartTime (1698718272120) criuTimeDeltaMillis (0)
[2023-10-31T02:11:24.307Z]  [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Tue Oct 31 02:11:20 UTC 2023, System.currentTimeMillis(): 1698718280557, System.nanoTime(): 1698718166740678368
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.411 0xa55800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.412 0xa55800          j9criu.18       - Taking a checkpoint with active clinit
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.510 0xa55800          j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1698718167693075406, checkpointNanoUTCTime = 1698718281510665061
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.674 0xa55800          j9criu.12       - Current syslogOptions: error,vital
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.676 0xa55800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1698718167859504964, j9time_current_time_nanos() returns 1698718281677095769
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:24.024 0xa55800          j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1698718170207451760, j9time_current_time_nanos() returns 1698718284025049925
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:24.025 0xa55800          j9criu.14       - After restore, restoreNanoUTCTime = 1698718284025049925, checkpointNanoUTCTime = 1698718281510665061, checkpointRestoreTimeDelta = 2514384864, restoreNanoTimeMonotonic = 1698718170207451760, checkpointNanoTimeMonotonic = 1698718167693075406, nanoTimeMonotonicClockDelta = 2514376354
[2023-10-31T02:11:24.307Z]  [OUT] JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:24.063 0xa55800          j9criu.10       < Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:24.071 0xa55800           j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698718284071) vmStartTime (1698718272120) criuTimeDeltaMillis (2514)
[2023-10-31T02:11:24.307Z]  [OUT] FAILED: testMXBeanUpTime() - uptimeAfterCheckpoint 9437 can't be greater than uptimeBeforeCheckpoint 7751 + 1500
[2023-10-31T02:11:24.307Z]  [OUT] Removed test output files
[2023-10-31T02:11:24.307Z]  [OUT] finished script
[2023-10-31T02:11:24.307Z]  [ERR] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 41: 3076089 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
[2023-10-31T02:11:24.307Z] >> Success condition was found: [Output match: Killed]
[2023-10-31T02:11:24.307Z] >> Success condition was not found: [Output match: PASSED: testMXBeanUpTime()]
[2023-10-31T02:11:24.307Z] >> Failure condition was not found: [Output match: CRIU is not enabled]
[2023-10-31T02:11:24.307Z] >> Failure condition was not found: [Output match: Operation not permitted]
[2023-10-31T02:11:24.307Z] >> Failure condition was found: [Output match: FAILED: testMXBeanUpTime()]
[2023-10-31T02:11:24.307Z] >> Success condition was not found: [Output match: Thread pid mismatch]
[2023-10-31T02:11:24.307Z] >> Success condition was not found: [Output match: do not match expected]
[2023-10-31T02:11:24.307Z] >> Success condition was not found: [Output match: Unable to create a thread:]
[2023-10-31T02:11:24.307Z] >> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]
[2023-10-31T02:11:24.307Z] >> Failure condition was not found: [Output match: User requested Java dump using]

[2023-10-31T02:15:48.054Z] ---TEST RESULTS---
[2023-10-31T02:15:48.054Z] Number of PASSED tests: 55 out of 56
[2023-10-31T02:15:48.054Z] Number of FAILED tests: 1 out of 56
[2023-10-31T02:15:48.054Z] 
[2023-10-31T02:15:48.054Z] ---SUMMARY OF FAILED TESTS---
[2023-10-31T02:15:48.054Z] Create CRIU checkpoint image and restore once - testMXBeanUpTime
[2023-10-31T02:15:48.054Z] -----------------------------
[2023-10-31T02:15:48.054Z] 
[2023-10-31T02:15:48.054Z] -----------------------------------
[2023-10-31T02:15:48.054Z] cmdLineTester_criu_nonPortableRestore_4_FAILED

50x internal grinder - 2 failures reproduced at ubu20s390x-svl-rt2-1.

This is likely a test issue.

r30shah commented 1 year ago

FYI @VermaSh

VermaSh commented 1 year ago

~Looking into the failure~ @JasonFengJ9 is looking into the failure

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/92 https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/93

JasonFengJ9 commented 1 year ago
j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698718279871) vmStartTime (1698718272120) criuTimeDeltaMillis (0)
j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1698718167693075406, checkpointNanoUTCTime = 1698718281510665061
j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1698718167859504964, j9time_current_time_nanos() returns 1698718281677095769
j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1698718170207451760, j9time_current_time_nanos() returns 1698718284025049925
j9criu.14       - After restore, restoreNanoUTCTime = 1698718284025049925, checkpointNanoUTCTime = 1698718281510665061, checkpointRestoreTimeDelta = 2514384864, restoreNanoTimeMonotonic = 1698718170207451760, checkpointNanoTimeMonotonic = 1698718167693075406, nanoTimeMonotonicClockDelta = 2514376354
j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698718284071) vmStartTime (1698718272120) criuTimeDeltaMillis (2514)

The time between first RuntimeMXBean.getUptime() and CRIUSupport.checkpointJVM() was ~1639ms which is more than adjustMillis = 1500ms https://github.com/eclipse-openj9/openj9/blob/04f76db60f678b3031e7973e5f42072852221688/test/functional/cmdLineTests/criu/src/org/openj9/criu/TimeChangeTest.java#L147 This probably was due to the slow machine creating local directories.

Will modify the test to minimize the time gap between first RuntimeMXBean.getUptime() and CRIUSupport.checkpointJVM().