eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

cmdLineTester_criu_nonPortableRestore MethodTypeDeadlockTest Blocking operation is not allowed in CRIU single thread mode #19653

Closed pshipton closed 4 months ago

pshipton commented 4 months ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/240/ - rh8-390-2 cmdLineTester_criu_nonPortableRestore_9

20:57:49  Testing: Create and Restore Criu Checkpoint Image once - MethodTypeDeadlockTest
20:57:49  Test start time: 2024/06/05 20:57:49 Eastern Standard Time
20:57:49  Running command: bash /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/jdkbinary/j2sdk-image/bin/java " -XX:+DebugOnRestore -Xjit  -XX:+ThrowOnDelayedCheckpointOperation -XX:sleepMillisecondsForNotCheckpointSafe=20 -Xtrace:print=j9vm.731 --add-opens java.base/jdk.internal.misc=ALL-UNNAMED  --add-exports java.base/openj9.internal.criu=ALL-UNNAMED" org.openj9.criu.DeadlockTest MethodTypeDeadlockTest 1 false false
20:57:49  Time spent starting: 4 milliseconds
20:57:52  Time spent executing: 3055 milliseconds
20:57:52  Test result: FAILED
20:57:52  Output from test:
20:57:52   [OUT] start running script
20:57:52   [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
20:57:52   [OUT] export LD_BIND_NOT=on
20:57:52   [OUT] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/jdkbinary/j2sdk-image/bin/java -XX:+EnableCRIUSupport  -XX:+DebugOnRestore -Xjit  -XX:+ThrowOnDelayedCheckpointOperation -XX:sleepMillisecondsForNotCheckpointSafe=20 -Xtrace:print=j9vm.731 --add-opens java.base/jdk.internal.misc=ALL-UNNAMED  --add-exports java.base/openj9.internal.criu=ALL-UNNAMED -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criu.jar org.openj9.criu.DeadlockTest MethodTypeDeadlockTest 1
20:57:52   [OUT] Pre-checkpoint
20:57:52   [OUT] main: Wed Jun 05 20:57:49 EDT 2024, Performing CRIUSupport.checkpointJVM(), System.currentTimeMillis(): 1717635469693, System.nanoTime(): 1717635469689705200
20:57:52   [OUT] JVMDUMP034I User requested Java dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_17176331522668/cmdLineTester_criu_nonPortableRestore_9/javacore.20240605.205749.1683304.0001.txt' through CRIUSingleThreadModeJVMCRIUException
20:57:52   [OUT] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/output_17176331522668/cmdLineTester_criu_nonPortableRestore_9/javacore.20240605.205749.1683304.0001.txt
20:57:52   [OUT] org.eclipse.openj9.criu.JVMCheckpointException: Exception thrown when running user pre-checkpoint 
20:57:52   [OUT]    at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:532)
20:57:52   [OUT]    at org.openj9.criu.CRIUTestUtils.checkPointJVM(CRIUTestUtils.java:92)
20:57:52   [OUT]    at org.openj9.criu.DeadlockTest.methodTypeDeadlockTest(DeadlockTest.java:222)
20:57:52   [OUT]    at org.openj9.criu.DeadlockTest.main(DeadlockTest.java:58)
20:57:52   [OUT] Caused by: openj9.internal.criu.JVMCheckpointException: Exception thrown when running user pre-checkpoint 
20:57:52   [OUT]    at java.base/openj9.internal.criu.InternalCRIUSupport.lambda$registerCheckpointHookHelper$1(InternalCRIUSupport.java:763)
20:57:52   [OUT]    at java.base/openj9.internal.criu.J9InternalCheckpointHookAPI$J9InternalCheckpointHook.runHook(J9InternalCheckpointHookAPI.java:143)
20:57:52   [OUT]    at java.base/openj9.internal.criu.J9InternalCheckpointHookAPI.runHooks(J9InternalCheckpointHookAPI.java:98)
20:57:52   [OUT]    at java.base/openj9.internal.criu.J9InternalCheckpointHookAPI.runPreCheckpointHooksSingleThread(J9InternalCheckpointHookAPI.java:107)
20:57:52   [OUT]    at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
20:57:52   [OUT]    at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:970)
20:57:52   [OUT]    at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:530)
20:57:52   [OUT]    ... 3 more
20:57:52   [OUT] Caused by: openj9.internal.criu.JVMCheckpointException: Blocking operation is not allowed in CRIU single thread mode.
20:57:52   [OUT]    at java.base/java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:78)
20:57:52   [OUT]    at java.base/jdk.internal.util.ReferencedKeyMap.removeStaleReferences(ReferencedKeyMap.java:332)
20:57:52   [OUT]    at java.base/jdk.internal.util.ReferencedKeyMap.get(ReferencedKeyMap.java:208)
20:57:52   [OUT]    at java.base/jdk.internal.util.ReferencedKeySet.get(ReferencedKeySet.java:173)
20:57:52   [OUT]    at java.base/java.lang.invoke.MethodType.makeImpl(MethodType.java:401)
20:57:52   [OUT]    at java.base/java.lang.invoke.MethodType.methodType(MethodType.java:319)
20:57:52   [OUT]    at org.openj9.criu.DeadlockTest.lambda$methodTypeDeadlockTest$5(DeadlockTest.java:217)
20:57:52   [OUT]    at java.base/openj9.internal.criu.InternalCRIUSupport.lambda$registerCheckpointHookHelper$1(InternalCRIUSupport.java:761)
20:57:52   [OUT]    ... 9 more
20:57:52   [OUT] TEST FAILED
20:57:52   [OUT] initiate restore
20:57:52   [OUT] Error (criu/protobuf.c:72): Unexpected EOF on (empty-image)
20:57:52   [OUT] Removed test output files
20:57:52   [OUT] finished script
20:57:52  >> Success condition was found: [Output match: User requested Java dump using]
20:57:52  >> Success condition was not found: [Output match: TEST PASSED]
20:57:52  >> Failure condition was found: [Output match: TEST FAILED]
20:57:52  >> Required condition was found: [Output match: Pre-checkpoint]
20:57:52  >> Success condition was not found: [Output match: Checkpoint blocked because thread]
20:57:52  >> Required condition was not found: [Output match: Killed]
20:57:52  >> Failure condition was not found: [Output match: CRIU is not enabled]
20:57:52  >> Failure condition was not found: [Output match: Operation not permitted]
20:57:52  >> Success condition was not found: [Output match: Thread pid mismatch]
20:57:52  >> Success condition was not found: [Output match: do not match expected]
20:57:52  >> Success condition was not found: [Output match: Unable to create a thread:]
20:57:52  >> Failure condition was found: [Output match: TEST FAILED]
20:57:52  >> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]

There are some closed issues for "Blocking operation is not allowed in CRIU single thread mode"

pshipton commented 4 months ago

@tajila fyi

tajila commented 4 months ago

@JasonFengJ9 does this issue look familiar?

JasonFengJ9 commented 4 months ago

@tajila This is similar to

The caller in question was fixed via

A related PR was

It seems this is a different call sequence, probably needs a @NotCheckpointSafe at java.lang.ref.ReferenceQueue.poll().

tajila commented 4 months ago

@JasonFengJ9 I think we need to add the annotation at a higher level, like MethodType::methodType()

JasonFengJ9 commented 4 months ago

I think we need to add the annotation at a higher level, like MethodType::methodType()

There are multiple variants of MethodType::methodType() (7 for JDK21, 6 for JDK17) Maybe java/lang/invoke/MethodType.makeImpl() which is shared by methodType(). OpenJ9 JDK11 has its own MethodType implementation, methodType() could be annotated.

~On a side note, java.base/java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:78) doesn't match the method and line number for both JDK17 and JDK21.~ OpenJ9 ReferenceQueue.java method and line number are correct.

JasonFengJ9 commented 4 months ago

javacore.20240618.104029.3321586.0001.txt

1TISIGINFO     Dump Event "systhrow" (00040000) Detail "org/eclipse/openj9/criu/JVMCheckpointException" "Blocking operation is not allowed in CRIU single thread mode." received

1CIJAVAVERSION JRE 17 Linux amd64-64 (build 17.0.10+7)
1CIVMVERSION   20240116_670
1CIJ9VMTAG     openj9-0.43.0
1CIJ9VMVERSION 2c3d78b48
1CIJITVERSION  j9jit_20240131_2043_
1CIOMRVERSION  ea8124dbc_CMPRSS
1CIJCLVERSION  2aad089841f based on jdk-17.0.10+7

1XMCURTHDINFO  Current thread
3XMTHREADINFO      "Default Executor-thread-1" J9VMThread:0x0000000000594300, omrthread_t:0x00007F7D6002A618, java/lang/Thread:0x00000000845E7A88, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x35, isDaemon:true)
3XMJAVALTHRCCL            org/eclipse/osgi/internal/framework/ContextFinder(0x000000008432AA68)
3XMTHREADINFO1            (native thread ID:0x32AF23, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00041020)
3XMTHREADINFO2            (native stack address range from:0x00007F7D7AF80000, to:0x00007F7D7B000000, size:0x80000)
3XMCPUTIME               CPU usage total: 1.273657773 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=69632 (0x11000)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/ref/ReferenceQueue.poll(ReferenceQueue.java:78(Compiled Code))
4XESTACKTRACE                at java/lang/invoke/MethodType$ConcurrentWeakInternSet.expungeStaleElements(MethodType.java:1406(Compiled Code))
4XESTACKTRACE                at java/lang/invoke/MethodType$ConcurrentWeakInternSet.get(MethodType.java:1366(Compiled Code))
4XESTACKTRACE                at java/lang/invoke/MethodType.makeImpl(MethodType.java:341(Compiled Code))
4XESTACKTRACE                at java/lang/invoke/MethodType.methodType(MethodType.java:257(Compiled Code))
4XESTACKTRACE                at java/lang/invoke/MethodTypeHelper.fromMethodDescriptorStringInternal(MethodTypeHelper.java:341(Compiled Code))
4XESTACKTRACE                at java/lang/invoke/MethodTypeHelper.vmResolveFromMethodDescriptorString(MethodTypeHelper.java:367)
4XESTACKTRACE                at java/lang/invoke/MethodHandleResolver.resolveInvokeDynamic(MethodHandleResolver.java:172)
4XESTACKTRACE                at com/ibm/ws/config/xml/internal/variables/ConfigVariableRegistry$1.prepare(ConfigVariableRegistry.java:160)
4XESTACKTRACE                at io/openliberty/checkpoint/spi/CheckpointPhase$StaticCheckpointHook.prepare(CheckpointPhase.java:485)
4XESTACKTRACE                at io/openliberty/checkpoint/internal/CheckpointImpl$$Lambda$353/0x0000000000000000.accept(Bytecode PC:6)
4XESTACKTRACE                at io/openliberty/checkpoint/internal/CheckpointImpl.callHooks(CheckpointImpl.java:577)
4XESTACKTRACE                at io/openliberty/checkpoint/internal/CheckpointImpl.prepare(CheckpointImpl.java:590)
4XESTACKTRACE                at io/openliberty/checkpoint/internal/CheckpointImpl.lambda$checkpoint$14(CheckpointImpl.java:396)
4XESTACKTRACE                at io/openliberty/checkpoint/internal/CheckpointImpl$$Lambda$364/0x0000000000000000.run(Bytecode PC:8)
4XESTACKTRACE                at org/eclipse/openj9/criu/CRIUSupport.lambda$registerCheckpointHookHelper$2(CRIUSupport.java:653)
4XESTACKTRACE                at org/eclipse/openj9/criu/CRIUSupport$$Lambda$366/0x0000000000000000.run(Bytecode PC:8)
4XESTACKTRACE                at org/eclipse/openj9/criu/J9InternalCheckpointHookAPI$J9InternalCheckpointHook.runHook(J9InternalCheckpointHookAPI.java:143)
4XESTACKTRACE                at org/eclipse/openj9/criu/J9InternalCheckpointHookAPI.runHooks(J9InternalCheckpointHookAPI.java:98)
5XESTACKTRACE                   (entered lock: org/eclipse/openj9/criu/CRIUSupport@0x00000000FF2C00B8, entry count: 1)
4XESTACKTRACE                at org/eclipse/openj9/criu/J9InternalCheckpointHookAPI.runPreCheckpointHooksSingleThread(J9InternalCheckpointHookAPI.java:107)
4XESTACKTRACE                at org/eclipse/openj9/criu/CRIUSupport.checkpointJVMImpl(Native Method)
4XESTACKTRACE                at org/eclipse/openj9/criu/CRIUSupport.checkpointJVM(CRIUSupport.java:823)
4XESTACKTRACE                at io/openliberty/checkpoint/internal/openj9/ExecuteCRIU_OpenJ9.dump(ExecuteCRIU_OpenJ9.java:55)
4XESTACKTRACE                at io/openliberty/checkpoint/internal/CheckpointImpl.checkpoint(CheckpointImpl.java:396)
4XESTACKTRACE                at io/openliberty/checkpoint/internal/CheckpointImpl.checkpointOrExitOnFailure(CheckpointImpl.java:303)
4XESTACKTRACE                at io/openliberty/checkpoint/internal/CheckpointImpl.check(CheckpointImpl.java:297)
4XESTACKTRACE                at com/ibm/ws/kernel/feature/internal/FeatureManager$$Lambda$344/0x0000000000000000.accept(Bytecode PC:6)
4XESTACKTRACE                at java/util/ArrayList.forEach(ArrayList.java:1511)
4XESTACKTRACE                at com/ibm/ws/kernel/feature/internal/FeatureManager.checkServerReady(FeatureManager.java:869)
4XESTACKTRACE                at com/ibm/ws/kernel/feature/internal/FeatureManager.update(FeatureManager.java:830)
4XESTACKTRACE                at com/ibm/ws/kernel/feature/internal/FeatureManager.processFeatureChanges(FeatureManager.java:932)
4XESTACKTRACE                at com/ibm/ws/kernel/feature/internal/FeatureManager$1.run(FeatureManager.java:715)
4XESTACKTRACE                at com/ibm/ws/threading/internal/ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:280)
4XESTACKTRACE                at java/util/concurrent/ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136(Compiled Code))
4XESTACKTRACE                at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
4XESTACKTRACE                at java/lang/Thread.run(Thread.java:857)
JasonFengJ9 commented 4 months ago

I think we need to add the annotation at a higher level, like MethodType::methodType()

There are multiple variants of MethodType::methodType() (7 for JDK21, 6 for JDK17) Maybe java/lang/invoke/MethodType.makeImpl() which is shared by methodType().

Adding @NotCheckpointSafe for java.lang.ref.ReferenceQueue.poll() needs one change in OpenJ9 while annotating MethodType::methodType() or makeImpl() requires multiple extension changes.

@tajila What's the concern for ReferenceQueue.poll()? Is there a deadlock scenario in MethodType::methodType()?

tajila commented 4 months ago

@tajila What's the concern for ReferenceQueue.poll()? Is there a deadlock scenario in MethodType::methodType()?

NVM, I thought we tried that already. But we only addressed Reference::enqueueImpl