eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.29k stars 722 forks source link

cmdLineTester_criu_nonPortableRestoreJDK11Up testObjectWaitTimedV2 actual elapsed time does not match expected #16907

Closed pshipton closed 1 year ago

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/129 cmdLineTester_criu_nonPortableRestoreJDK11Up_0

Testing: Create CRIU checkpoint image and restore once - testObjectWaitTimedV2
Test start time: 2023/03/14 04:42:42 Atlantic Standard Time
Running command: bash /home/jenkins/workspace/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/openjdkbinary/j2sdk-image/bin/java " -Xjit -XX:+CRIURestoreNonPortableMode  --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=java.base/openj9.internal.criu=ALL-UNNAMED" "org.openj9.criu.JDK11UpTimeoutAdjustmentTest testObjectWaitTimedV2" 1 1 false
Time spent starting: 7 milliseconds
Time spent executing: 25256 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] Start test name: testObjectWaitTimedV2
 [OUT] Before starting testObjectWaitTimedV2, current thread name: main, Tue Mar 14 04:42:43 ADT 2023, System.currentTimeMillis(): 1678779763312, System.nanoTime(): 13009006661565022
 [OUT] testObjectWaitTimedV2() before wait(ms), current thread name: Thread-0, Tue Mar 14 04:42:43 ADT 2023, System.currentTimeMillis(): 1678779763419, System.nanoTime(): 13009006768139721
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Tue Mar 14 04:42:53 ADT 2023, System.currentTimeMillis(): 1678779773263, System.nanoTime(): 13009016611877784
 [OUT] testObjectWaitTimedV2() after wait(ms), current thread name: Thread-0, Tue Mar 14 04:42:55 ADT 2023, System.currentTimeMillis(): 1678779775877, System.nanoTime(): 13009016657576039
 [OUT] FAILED: expected wait time 10000 ms, but the actual elapsed time was: 9888677222ms with startNanoTime = 13009006768565670, and endNanoTime = 13009016657242892, CheckpointRestoreNanoTimeDelta: 2569157133
 [OUT] End testObjectWaitTimedV2, current thread name: main, Tue Mar 14 04:43:07 ADT 2023, System.currentTimeMillis(): 1678779787870, System.nanoTime(): 13009028649761469
 [OUT] Removed testOutput file
 [OUT] finished script
 [ERR] /home/jenkins/workspace/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 40: 13541 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
>> Success condition was not found: [Output match: PASSED: expected wait time]
>> Required condition was found: [Output match: Killed]
>> Required condition was found: [Output match: Start test name: testObjectWaitTimedV2]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Failure condition was found: [Output match: FAILED: expected wait time]
>> Failure condition was not found: [Output match: InterruptedException]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]
pshipton commented 1 year ago

@tajila fyi

tajila commented 1 year ago

@JasonFengJ9 Please take a look

JasonFengJ9 commented 1 year ago

[OUT] FAILED: expected wait time 10000 ms, but the actual elapsed time was: 9888677222ms with startNanoTime = 13009006768565670, and endNanoTime = 13009016657242892, CheckpointRestoreNanoTimeDelta: 2569157133

There was a mismatching ms and ns which is a test problem. https://github.com/eclipse-openj9/openj9/blob/5938c2f8973384e22ef04e540f60dee80b72af15/test/functional/cmdLineTests/criu/src/org/openj9/criu/JDK11UpTimeoutAdjustmentTest.java#L232-L233 The actual elapsed time was 9888ms compared with the target 10000ms, will check System.nanoTime() and Object.wait() time compensation across C/R.

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/424 cmdLineTester_criu_nonPortableRestoreJDK11Up_2 -Xjit:count=0 -XX:+CRIURestoreNonPortableMode

Testing: Create CRIU checkpoint image and restore once - testObjectWaitTimedV2
Test start time: 2023/04/10 15:38:00 Eastern Standard Time
Running command: bash /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/openjdkbinary/j2sdk-image/bin/java " -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu} --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=java.base/openj9.internal.criu=ALL-UNNAMED" "org.openj9.criu.JDK11UpTimeoutAdjustmentTest testObjectWaitTimedV2" 1 1 false
Time spent starting: 6 milliseconds
Time spent executing: 30712 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] Start test name: testObjectWaitTimedV2
 [OUT] Before starting testObjectWaitTimedV2, current thread name: main, Mon Apr 10 15:38:02 EDT 2023, System.currentTimeMillis(): 1681155482818, System.nanoTime(): 1681155482809075038
 [OUT] testObjectWaitTimedV2 NO C/R before wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:03 EDT 2023, System.currentTimeMillis(): 1681155483278, System.nanoTime(): 1681155483269276042
 [OUT] testObjectWaitTimedV2 NO C/R after wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:13 EDT 2023, System.currentTimeMillis(): 1681155493283, System.nanoTime(): 1681155493274886670
 [OUT] FAILED: expected wait time 10000500000 ns, but the actual elapsed time was: 10000499781ns (~10000ms) with startNanoTime = 1681155483271517053ns, and endNanoTime = 1681155493272016834ns, CheckpointRestoreNanoTimeDelta: 0ns (~0ms)
 [OUT] testObjectWaitTimedV2 before wait(10000, 500000), current thread name: Thread-0, Mon Apr 10 15:38:13 EDT 2023, System.currentTimeMillis(): 1681155493344, System.nanoTime(): 1681155493335445898
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Mon Apr 10 15:38:15 EDT 2023, System.currentTimeMillis(): 1681155495988, System.nanoTime(): 1681155495979505548
 [OUT] 19:38:15.994*0x12d9800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] 19:38:15.996 0x12d9800          j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1681155495987701688, checkpointNanoUTCTime = 1681155495997125820
 [OUT] 19:38:16.013 0x12d9800          j9criu.12       - Current syslogOptions: error,vital
 [OUT] 19:38:16.013 0x12d9800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1681155496004096889, j9time_current_time_nanos() returns 1681155496013521069
 [OUT] 19:38:18.090 0x12d9800          j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1681155498081639212, j9time_current_time_nanos() returns 1681155498091065442
 [OUT] 19:38:18.090 0x12d9800          j9criu.14       - After restore, restoreNanoUTCTime = 1681155498091065442, checkpointNanoUTCTime = 1681155495997125820, checkpointRestoreTimeDelta = 2093939622, restoreNanoTimeMonotonic = 1681155498081639212, checkpointNanoTimeMonotonic = 1681155495987701688, nanoTimeMonotonicClockDelta = 2093937524
 [OUT] 19:38:18.093 0x12d9800          j9criu.10       < Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] testObjectWaitTimedV2 before wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:18 EDT 2023, System.currentTimeMillis(): 1681155498093, System.nanoTime(): 1681155495991015257
 [OUT] testObjectWaitTimedV2 after wait(10000, 500000), current thread name: Thread-0, Mon Apr 10 15:38:25 EDT 2023, System.currentTimeMillis(): 1681155505455, System.nanoTime(): 1681155503352646184
 [OUT] PASSED: expected wait time 10000500000 ns, but the actual elapsed time was: 10000620129ns (~10000ms) with startNanoTime = 1681155493335825704ns, and endNanoTime = 1681155503336445833ns, CheckpointRestoreNanoTimeDelta: 2093939622ns (~2093ms)
 [OUT] testObjectWaitTimedV2 after wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:28 EDT 2023, System.currentTimeMillis(): 1681155508096, System.nanoTime(): 1681155505993630294
 [OUT] PASSED: expected wait time 10000500000 ns, but the actual elapsed time was: 10000653991ns (~10000ms) with startNanoTime = 1681155495992900937ns, and endNanoTime = 1681155505993554928ns, CheckpointRestoreNanoTimeDelta: 2093939622ns (~2093ms)
 [OUT] End testObjectWaitTimedV2, current thread name: main, Mon Apr 10 15:38:30 EDT 2023, System.currentTimeMillis(): 1681155510628, System.nanoTime(): 1681155508525818197
 [OUT] Removed testOutput file
 [OUT] finished script
 [ERR] /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 40: 14534 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
>> Success condition was found: [Output match: PASSED: expected wait time]
>> Required condition was found: [Output match: Killed]
>> Required condition was found: [Output match: Start test name: testObjectWaitTimedV2]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Failure condition was found: [Output match: FAILED: expected wait time]
>> Failure condition was not found: [Output match: InterruptedException]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]
JasonFengJ9 commented 1 year ago
 [OUT] Before starting testObjectWaitTimedV2, current thread name: main, Mon Apr 10 15:38:02 EDT 2023, System.currentTimeMillis(): 1681155482818, System.nanoTime(): 1681155482809075038
 [OUT] testObjectWaitTimedV2 NO C/R before wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:03 EDT 2023, System.currentTimeMillis(): 1681155483278, System.nanoTime(): 1681155483269276042
 [OUT] testObjectWaitTimedV2 NO C/R after wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:13 EDT 2023, System.currentTimeMillis(): 1681155493283, System.nanoTime(): 1681155493274886670
 [OUT] FAILED: expected wait time 10000500000 ns, but the actual elapsed time was: 10000499781ns (~10000ms) with startNanoTime = 1681155483271517053ns, and endNanoTime = 1681155493272016834ns, CheckpointRestoreNanoTimeDelta: 0ns (~0ms)

This failure occurred before checkpoint, ~219ns less than the expected elapsed time.

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/519/ cmdLineTester_criu_nonPortableRestoreJDK11Up_2 -Xjit:count=0 -XX:+CRIURestoreNonPortableMode

Testing: Create CRIU checkpoint image and restore once - testObjectWaitTimedV1
Test start time: 2023/04/20 07:37:19 Coordinated Universal Time
Running command: bash /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/openjdkbinary/j2sdk-image/bin/java " -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu} --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=java.base/openj9.internal.criu=ALL-UNNAMED" "org.openj9.criu.JDK11UpTimeoutAdjustmentTest testObjectWaitTimedV1" 1 1 false
Time spent starting: 7 milliseconds
Time spent executing: 43554 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] Start test name: testObjectWaitTimedV1
 [OUT] Before starting testObjectWaitTimedV1, current thread name: main, Thu Apr 20 07:37:23 UTC 2023, System.currentTimeMillis(): 1681976243312, System.nanoTime(): 1681976243307194105
 [OUT] testObjectWaitTimedV1 NO C/R before wait(10000, 0), current thread name: main, Thu Apr 20 07:37:24 UTC 2023, System.currentTimeMillis(): 1681976244849, System.nanoTime(): 1681976244844687865
 [OUT] testObjectWaitTimedV1 NO C/R after wait(10000, 0), current thread name: main, Thu Apr 20 07:37:34 UTC 2023, System.currentTimeMillis(): 1681976254879, System.nanoTime(): 1681976254874093625
 [OUT] PASSED: expected wait time 10000000000 ns, but the actual elapsed time was: 10000290980ns (~10000ms) with startNanoTime = 1681976244867283593ns, and endNanoTime = 1681976254867574573ns, CheckpointRestoreNanoTimeDelta: 0ns (~0ms)
 [OUT] testObjectWaitTimedV1 before wait(10000, 0), current thread name: Thread-0, Thu Apr 20 07:37:35 UTC 2023, System.currentTimeMillis(): 1681976255196, System.nanoTime(): 1681976255191475431
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Thu Apr 20 07:37:43 UTC 2023, System.currentTimeMillis(): 1681976263034, System.nanoTime(): 1681976263029673537
 [OUT] 07:37:43.096*0x138800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] 07:37:43.106 0x138800          j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1681976263101990092, checkpointNanoUTCTime = 1681976263107223156
 [OUT] 07:37:43.158 0x138800          j9criu.12       - Current syslogOptions: error,vital
 [OUT] 07:37:43.158 0x138800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1681976263153541509, j9time_current_time_nanos() returns 1681976263158774448
 [OUT] 07:37:46.956 0x138800          j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1681976266951114284, j9time_current_time_nanos() returns 1681976266956351514
 [OUT] 07:37:46.956 0x138800          j9criu.14       - After restore, restoreNanoUTCTime = 1681976266956351514, checkpointNanoUTCTime = 1681976263107223156, checkpointRestoreTimeDelta = 3849128358, restoreNanoTimeMonotonic = 1681976266951114284, checkpointNanoTimeMonotonic = 1681976263101990092, nanoTimeMonotonicClockDelta = 3849124192
 [OUT] 07:37:46.961 0x138800          j9criu.10       < Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] testObjectWaitTimedV1 before wait(10000, 0), current thread name: main, Thu Apr 20 07:37:46 UTC 2023, System.currentTimeMillis(): 1681976266961, System.nanoTime(): 1681976263107596407
 [OUT] testObjectWaitTimedV1 after wait(10000, 0), current thread name: Thread-0, Thu Apr 20 07:37:46 UTC 2023, System.currentTimeMillis(): 1681976266967, System.nanoTime(): 1681976263112892820
 [OUT] FAILED: expected wait time 10000000000 ns, but the actual elapsed time was: 7920362661ns (~7920ms) with startNanoTime = 1681976255192508834ns, and endNanoTime = 1681976263112871495ns, CheckpointRestoreNanoTimeDelta: 3849128358ns (~3849ms)
 [OUT] testObjectWaitTimedV1 after wait(10000, 0), current thread name: main, Thu Apr 20 07:37:56 UTC 2023, System.currentTimeMillis(): 1681976276975, System.nanoTime(): 1681976273121451196
 [OUT] PASSED: expected wait time 10000000000 ns, but the actual elapsed time was: 10002690947ns (~10002ms) with startNanoTime = 1681976263118695139ns, and endNanoTime = 1681976273121386086ns, CheckpointRestoreNanoTimeDelta: 3849128358ns (~3849ms)
 [OUT] End testObjectWaitTimedV1, current thread name: main, Thu Apr 20 07:38:02 UTC 2023, System.currentTimeMillis(): 1681976282047, System.nanoTime(): 1681976278193676134
 [OUT] Removed testOutput file
 [OUT] finished script
 [ERR] /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 40: 2627005 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
>> Success condition was found: [Output match: PASSED: expected wait time]
>> Required condition was found: [Output match: Killed]
>> Required condition was found: [Output match: Start test name: testObjectWaitTimedV1]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Failure condition was found: [Output match: FAILED: expected wait time]
>> Failure condition was not found: [Output match: InterruptedException]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]
keithc-ca commented 1 year ago

Another failure: https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_x86-64_linux_OpenJDK11_testList_0/62/

tajila commented 1 year ago

@JasonFengJ9 does this still fail?

JasonFengJ9 commented 1 year ago

@tajila It didn't appear in recent internal builds.

Most of the occurrences were reported from open runs. @pshipton any recent observation?

pshipton commented 1 year ago

any recent observation?

Any recent failures would have been added to the appropriate issues (except during my vacation times), so no.