eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

cmdLineTester_criu_jitserverPostRestore_1 "Test -Xjit:exclude={*}" "Post-checkpoint" not found #18013

Closed pshipton closed 11 months ago

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_s390x_linux_OpenJDK11_testList_1/78

Testing: Test -Xjit:exclude={*}
Test start time: 2023/08/24 12:17:20 Coordinated Universal Time
Running command: bash /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OpenJDK11_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuJitServerScript.sh /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OpenJDK11_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OpenJDK11_testList_1/openjdkbinary/j2sdk-image/bin " -Xjit:count=0 " org.openj9.criu.OptionsFileTest "JitOptionsTest -XX:+UseJITServer -Xjit:exclude={*}" 1 false false
Time spent starting: 2 milliseconds
Time spent executing: 9798 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] Starting /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OpenJDK11_testList_1/openjdkbinary/j2sdk-image/bin/jitserver -XX:JITServerPort=55370
 [OUT] 2134186 ?        00:00:00 jitserver
 [OUT] JITSERVER EXISTS
 [OUT] Pre-checkpoint
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Thu Aug 24 12:17:24 UTC 2023, System.currentTimeMillis(): 1692879444122, System.nanoTime(): 1692879444118106627
 [OUT] JVMDUMP034I User requested Java dump using '/home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OpenJDK11_testList_1/aqa-tests/TKG/output_16928771598140/cmdLineTester_criu_jitserverPostRestore_1/javacore.20230824.121724.2134265.0001.txt' through org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl
 [OUT] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OpenJDK11_testList_1/aqa-tests/TKG/output_16928771598140/cmdLineTester_criu_jitserverPostRestore_1/javacore.20230824.121724.2134265.0001.txt
 [OUT] Exception in thread "main" org.eclipse.openj9.criu.JVMCheckpointException: The JVM attempted to checkpoint but was unable to due to code being executed in @NotCheckpointSafe frames. Retry attempts: 100
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:679)
 [OUT]  at org.openj9.criu.CRIUTestUtils.checkPointJVM(CRIUTestUtils.java:77)
 [OUT]  at org.openj9.criu.OptionsFileTest.jitOptionsTest(OptionsFileTest.java:324)
 [OUT]  at org.openj9.criu.OptionsFileTest.main(OptionsFileTest.java:72)
 [OUT] Error (criu/protobuf.c:72): Unexpected EOF on (empty-image)
 [OUT] Removed test output files
 [OUT] 2134186 ?        00:00:00 jitserver
 [OUT] JITSERVER STILL EXISTS
 [OUT] Terminating /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OpenJDK11_testList_1/openjdkbinary/j2sdk-image/bin/jitserver -XX:JITServerPort=55370
 [OUT] finished script
 [ERR] JITServer is currently a technology preview. Its use is not yet supported.
 [ERR] 
 [ERR] JITServer is ready to accept incoming requests
 [ERR] /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OpenJDK11_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuJitServerScript.sh: line 91: 2134186 Killed                  $2/jitserver $JITSERVER_OPTIONS
>> Success condition was found: [Output match: Killed]
>> Required condition was found: [Output match: Pre-checkpoint]
>> Success condition was not found: [Output match: Post-checkpoint]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Success condition was not found: [Output match: Some or all compiled code in the code cache invalidated post restore.]
>> Failure condition was not found: [Output match: JIT compilation disabled post restore.]
>> Failure condition was not found: [Output match: AOT load and compilation disabled post restore.]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Success condition was found: [Output match: JITSERVER EXISTS]
>> Success condition was found: [Output match: JITSERVER STILL EXISTS]
>> Failure condition was not found: [Output match: JITSERVER DOES NOT EXIST]
>> Failure condition was not found: [Output match: JITSERVER NO LONGER EXISTS]
>> Failure condition was found: [Output match: User requested Java dump using]
pshipton commented 1 year ago

@TobiAjila

tajila commented 1 year ago

@dsouzai ^^

dsouzai commented 1 year ago

The test is failing because of

 [OUT] Exception in thread "main" org.eclipse.openj9.criu.JVMCheckpointException: The JVM attempted to checkpoint but was unable to due to code being executed in @NotCheckpointSafe frames. Retry attempts: 100
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:679)
 [OUT]  at org.openj9.criu.CRIUTestUtils.checkPointJVM(CRIUTestUtils.java:77)
 [OUT]  at org.openj9.criu.OptionsFileTest.jitOptionsTest(OptionsFileTest.java:324)
 [OUT]  at org.openj9.criu.OptionsFileTest.main(OptionsFileTest.java:72)

which means that we won't do a restore, which is why Post-Checkpoint is not found. I don't think this is a JIT issue.

dsouzai commented 1 year ago

Why was a java dump even requested during this test?

tajila commented 1 year ago

Ya doesnt look like a JIT issue

Why was a java dump even requested during this test?

one of the threads was stuck in a or notCheckpointSafe frame

tajila commented 1 year ago

@JasonFengJ9 ^^

JasonFengJ9 commented 1 year ago

From aqa-tests/TKG/output_16928771598140/cmdLineTester_criu_jitserverPostRestore_1/javacore.20230824.121724.2134265.0001.txt

3XMTHREADINFO      "Finalizer thread" J9VMThread:0x0000000000E8B600, omrthread_t:0x000003FFAC2214D8, java/lang/Thread:0x000000000D5942A0, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x1B, isDaemon:true)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x000000000D585A58)
3XMTHREADINFO1            (native thread ID:0x209115, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000001)
3XMTHREADINFO2            (native stack address range from:0x000003FFB1440000, to:0x000003FFB1480000, size:0x40000)
3XMCPUTIME               CPU usage total: 0.000382457 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=11952 (0x2EB0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/ref/Reference.enqueueImpl(Reference.java:160(Compiled Code))
5XESTACKTRACE                   (entered lock: com/ibm/oti/util/WeakReferenceNode@0x000000000D59A1D8, entry count: 1)

[OUT] Exception in thread "main" org.eclipse.openj9.criu.JVMCheckpointException: The JVM attempted to checkpoint but was unable to due to code being executed in @NotCheckpointSafe frames. Retry attempts: 100

java/lang/ref/Reference.enqueueImpl() has @NotCheckpointSafe annotation and was jitted. It seems this thread caused JVMCheckpointException.

tajila commented 11 months ago

@singh264 Please take a look at this as well. It may be a dup of the one you are currently working on

singh264 commented 11 months ago

@tajila I was unable to reproduce the test failure after running the test for 1000 iterations on a Linux x86_64 machine.

singh264 commented 11 months ago

@tajila I was unable to reproduce the test failure after running the test for 1000 iterations on a Linux s390x machine.

It seems like the test failure is not reproducible.