eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

cmdLineTester_criu_nonPortableRestore_Xtrace_tracepoint - Blocking operation is not allowed in CRIU single thread mode #18399

Closed pshipton closed 10 months ago

pshipton commented 11 months ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR/41/ cmdLineTester_criu_nonPortableRestore_Xtrace_tracepoint_4 -Xgcpolicy:optavgpause

Testing: Restore dump options test with no dump options specified before checkpoint and -Xdump:dynamic required
Test start time: 2023/11/03 22:54:25 Coordinated Universal Time
Running command: bash /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/jdkbinary/j2sdk-image/bin/java " -Xgcpolicy:optavgpause  -Xtrace:print={j9jcl.219} -Xdump:dynamic" org.openj9.criu.OptionsFileTest dumpOptionsTestRequireDynamic 1
Time spent starting: 7 milliseconds
Time spent executing: 3356 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/jdkbinary/j2sdk-image/bin/java -XX:+EnableCRIUSupport  -Xgcpolicy:optavgpause  -Xtrace:print={j9jcl.219} -Xdump:dynamic -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criu.jar org.openj9.criu.OptionsFileTest dumpOptionsTestRequireDynamic 1
 [OUT] 22:54:25.726*0x11200           j9jcl.219      > initializeKnownClasses
 [OUT] Pre-checkpoint
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Fri Nov 03 22:54:25 UTC 2023, System.currentTimeMillis(): 1699052066073, System.nanoTime(): 1058619859790150
 [OUT] JVMDUMP034I User requested Java dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/output_1699047107449/cmdLineTester_criu_nonPortableRestore_Xtrace_tracepoint_4/javacore.20231103.225426.2781425.0001.txt' through CRIUSingleThreadModeJVMCRIUException
 [OUT] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/output_1699047107449/cmdLineTester_criu_nonPortableRestore_Xtrace_tracepoint_4/javacore.20231103.225426.2781425.0001.txt
 [OUT] java.lang.reflect.InvocationTargetException
 [OUT] JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
 [OUT] JVMDUMP034I User requested Java dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/output_1699047107449/cmdLineTester_criu_nonPortableRestore_Xtrace_tracepoint_4/javacore.20231103.225428.2781425.0002.txt' through CRIUSingleThreadModeJVMCRIUException
 [OUT] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/output_1699047107449/cmdLineTester_criu_nonPortableRestore_Xtrace_tracepoint_4/javacore.20231103.225428.2781425.0002.txt
 [OUT] Exception in thread "main" java.lang.BootstrapMethodError: bootstrap method initialization exception
 [OUT]  at java.base/java.lang.invoke.BootstrapMethodInvoker.invoke(BootstrapMethodInvoker.java:188)
 [OUT]  at java.base/java.lang.invoke.CallSite.makeSite(CallSite.java:316)
 [OUT]  at java.base/java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:274)
 [OUT]  at java.base/java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:264)
 [OUT]  at java.base/java.lang.invoke.MethodHandleResolver.resolveInvokeDynamic(MethodHandleResolver.java:200)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.clearInetAddressCache(CRIUSupport.java:757)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.J9InternalCheckpointHookAPI$J9InternalCheckpointHook.runHook(J9InternalCheckpointHookAPI.java:143)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.J9InternalCheckpointHookAPI.runHooks(J9InternalCheckpointHookAPI.java:98)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.J9InternalCheckpointHookAPI.runPostRestoreHooksSingleThread(J9InternalCheckpointHookAPI.java:115)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:812)
 [OUT]  at org.openj9.criu.CRIUTestUtils.checkPointJVM(CRIUTestUtils.java:77)
 [OUT]  at org.openj9.criu.OptionsFileTest.dumpOptionsTestRequireDynamic(OptionsFileTest.java:295)
 [OUT]  at org.openj9.criu.OptionsFileTest.main(OptionsFileTest.java:69)
 [OUT] Caused by: java.lang.invoke.LambdaConversionException: Exception instantiating lambda object
 [OUT]  at java.base/java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite(InnerClassLambdaMetafactory.java:243)
 [OUT]  at java.base/java.lang.invoke.LambdaMetafactory.metafactory(LambdaMetafactory.java:341)
 [OUT]  at java.base/java.lang.invoke.BootstrapMethodInvoker.invoke(BootstrapMethodInvoker.java:134)
 [OUT]  ... 13 more
 [OUT] Caused by: org.eclipse.openj9.criu.JVMRestoreException: Blocking operation is not allowed in CRIU single thread mode.
 [OUT]  at java.base/jdk.internal.ref.PhantomCleanable.insert(PhantomCleanable.java:87)
 [OUT]  at java.base/jdk.internal.ref.PhantomCleanable.<init>(PhantomCleanable.java:68)
 [OUT]  at java.base/jdk.internal.ref.CleanerImpl$PhantomCleanableRef.<init>(CleanerImpl.java:164)
 [OUT]  at java.base/java.lang.ref.Cleaner.register(Cleaner.java:224)
 [OUT]  at java.base/java.lang.invoke.MethodHandleNatives$CallSiteContext.make(MethodHandleNatives.java:86)
 [OUT]  at java.base/java.lang.invoke.CallSite.<init>(CallSite.java:145)
 [OUT]  at java.base/java.lang.invoke.ConstantCallSite.<init>(ConstantCallSite.java:50)
 [OUT]  at java.base/java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite(InnerClassLambdaMetafactory.java:236)
 [OUT]  ... 15 more
 [OUT] JVMDUMP039I Processing dump event "vmstop", detail "#0000000000000000" at 2023/11/03 22:54:28 - please wait.
 [OUT] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/output_1699047107449/cmdLineTester_criu_nonPortableRestore_Xtrace_tracepoint_4/javacore.20231103.225428.2781425.0003.txt' in response to an event
 [OUT] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/output_1699047107449/cmdLineTester_criu_nonPortableRestore_Xtrace_tracepoint_4/javacore.20231103.225428.2781425.0003.txt
 [OUT] JVMDUMP013I Processed dump event "vmstop", detail "#0000000000000000".
 [OUT] Removed test output files
 [OUT] finished script
 [ERR] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 41: 2781425 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
>> Success condition was found: [Output match: Killed]
>> Required condition was found: [Output match: Pre-checkpoint]
>> Success condition was found: [Output match: Processing dump event "vmstop"]
>> Success condition was not found: [Output match: Processing dump event "throw", detail "java/lang/OutOfMemoryError"]
>> Success condition was found: [Output match: JVM requested Java dump]
>> Success condition was not found: [Output match: Post-checkpoint]
>> Failure condition was found: [Output match: org.eclipse.openj9.criu.JVMRestoreException]
>> Failure condition was not found: [Output match: Dump option unrecognized: -Xdump:nofailover]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]
>> Failure condition was found: [Output match: User requested Java dump using]
pshipton commented 11 months ago

@TobiAjila fyi

pshipton commented 11 months ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/93

@TobiAjila I see failures in cmdLineTester_criu_nonPortableRestore tests all the time for the past few weeks, are we working to fix that?

Some of them are https://github.com/eclipse-openj9/openj9/issues/18384

tajila commented 11 months ago

Some of them are https://github.com/eclipse-openj9/openj9/issues/18384

Yes @JasonFengJ9 Is looking into it

tajila commented 11 months ago

@singh264 Can you please take a look at this. You can generate a system core when the JVMRestoreException is triggered to determine what is causing the blocking operation.

singh264 commented 11 months ago

@tajila I generated a system core file when JVMRestoreException is triggered during the test failure. In addition to the options in dumpOptionsTestRequireDynamic, I observe the error for the options in TraceOptionsTest2 and TraceOptionsTest3.

singh264 commented 11 months ago

@singh264 Can you please take a look at this. You can generate a system core when the JVMRestoreException is triggered to determine what is causing the blocking operation.

The blocking operation seems to occur during the CRIUSupport.clearInetAddressCache() post restore hook when MethodHandleResolver.resolveInvokeDynamic(..) is called before CRIU single thread mode is disabled in Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl(..).

During the test failure, the code path taken from MethodHandleResolver.resolveInvokeDynamic leads to PhantomCleanable.insert() that includes a synchronized block and the monitorenter bytecode. It seems like the code path taken when resolveInvokeDynamic is called is not always the same since the test fails intermittently.

> !threads
    !stack 0x0001aa00   !j9vmthread 0x0001aa00  !j9thread 0x7fc964007440    tid 0x1362b0 (1270448) // (main)
    !stack 0x0001e700   !j9vmthread 0x0001e700  !j9thread 0x7fc9640c2eb0    tid 0x1362b2 (1270450) // (JIT Compilation Thread-000 Suspended)
    !stack 0x00022400   !j9vmthread 0x00022400  !j9thread 0x7fc9640c3428    tid 0x1362b3 (1270451) // (JIT Compilation Thread-001 Suspended)
    !stack 0x00026000   !j9vmthread 0x00026000  !j9thread 0x7fc9640c43c0    tid 0x1362b4 (1270452) // (JIT Compilation Thread-002 Suspended)
    !stack 0x00029d00   !j9vmthread 0x00029d00  !j9thread 0x7fc9640c4938    tid 0x1362b5 (1270453) // (JIT Compilation Thread-003 Suspended)
    !stack 0x0002da00   !j9vmthread 0x0002da00  !j9thread 0x7fc9640c58d0    tid 0x1362b6 (1270454) // (JIT Compilation Thread-004 Suspended)
    !stack 0x00031700   !j9vmthread 0x00031700  !j9thread 0x7fc9640c5e48    tid 0x1362b7 (1270455) // (JIT Compilation Thread-005 Suspended)
    !stack 0x00035400   !j9vmthread 0x00035400  !j9thread 0x7fc9640c6de0    tid 0x1362b8 (1270456) // (JIT Compilation Thread-006)
    !stack 0x00039100   !j9vmthread 0x00039100  !j9thread 0x7fc9640c7358    tid 0x1362b9 (1270457) // (JIT Compilation Thread-007 Suspended)
    !stack 0x0003cd00   !j9vmthread 0x0003cd00  !j9thread 0x7fc9640c82f0    tid 0x1362ba (1270458) // (JIT Compilation Thread-008 Suspended)
    !stack 0x00040a00   !j9vmthread 0x00040a00  !j9thread 0x7fc9640c8868    tid 0x1362bb (1270459) // (JIT Compilation Thread-009 Suspended)
    !stack 0x00044700   !j9vmthread 0x00044700  !j9thread 0x7fc9640c9800    tid 0x1362bc (1270460) // (JIT Compilation Thread-010 Suspended)
    !stack 0x00048400   !j9vmthread 0x00048400  !j9thread 0x7fc9640c9d78    tid 0x1362bd (1270461) // (JIT Compilation Thread-011 Suspended)
    !stack 0x0004c100   !j9vmthread 0x0004c100  !j9thread 0x7fc9640cad10    tid 0x1362be (1270462) // (JIT Compilation Thread-012 Suspended)
    !stack 0x0004fd00   !j9vmthread 0x0004fd00  !j9thread 0x7fc9640cb288    tid 0x1362bf (1270463) // (JIT Compilation Thread-013 Suspended)
    !stack 0x00053a00   !j9vmthread 0x00053a00  !j9thread 0x7fc9640cc220    tid 0x1362c0 (1270464) // (JIT Compilation Thread-014 Suspended)
    !stack 0x00057700   !j9vmthread 0x00057700  !j9thread 0x7fc9640cc798    tid 0x1362c1 (1270465) // (JIT Diagnostic Compilation Thread-015 Suspended)
    !stack 0x0005b400   !j9vmthread 0x0005b400  !j9thread 0x7fc96411c880    tid 0x1362c2 (1270466) // (JIT-SamplerThread)
    !stack 0x0005f100   !j9vmthread 0x0005f100  !j9thread 0x7fc96411cdf8    tid 0x1362c3 (1270467) // (IProfiler)
    !stack 0x00162800   !j9vmthread 0x00162800  !j9thread 0x7fc964144040    tid 0x1362c4 (1270468) // (Common-Cleaner)
    !stack 0x00231900   !j9vmthread 0x00231900  !j9thread 0x7fc9642dcf10    tid 0x1362c6 (1270470) // (Concurrent Mark Helper)
    !stack 0x00235600   !j9vmthread 0x00235600  !j9thread 0x7fc9642dd488    tid 0x1362c7 (1270471) // (GC Worker)
    !stack 0x00239300   !j9vmthread 0x00239300  !j9thread 0x7fc9642de1a0    tid 0x1362c8 (1270472) // (GC Worker)
    !stack 0x0023d000   !j9vmthread 0x0023d000  !j9thread 0x7fc9642de718    tid 0x1362c9 (1270473) // (GC Worker)
    !stack 0x00278e00   !j9vmthread 0x00278e00  !j9thread 0x7fc9642e3d78    tid 0x1362cf (1270479) // (Attach API wait loop)
    !stack 0x00250000   !j9vmthread 0x00250000  !j9thread 0x7fc9642e3800    tid 0x1362d0 (1270480) // (Finalizer thread)
    !stack 0x0024c300   !j9vmthread 0x0024c300  !j9thread 0x7fc9642e06c0    tid 0x1362d2 (1270482) // (GC Worker)
    !stack 0x00240c00   !j9vmthread 0x00240c00  !j9thread 0x7fc9642e0c38    tid 0x1362d3 (1270483) // (GC Worker)
    !stack 0x00244900   !j9vmthread 0x00244900  !j9thread 0x7fc9642df430    tid 0x1362d5 (1270485) // (GC Worker)
    !stack 0x00248600   !j9vmthread 0x00248600  !j9thread 0x7fc9642df9a8    tid 0x1362d6 (1270486) // (GC Worker)
> !stack 0x0001aa00
<1aa00>                             Generic special frame
<1aa00>     !j9method 0x0000000000154500   jdk/internal/ref/PhantomCleanable.insert()V
<1aa00>     !j9method 0x00000000001544C0   jdk/internal/ref/PhantomCleanable.<init>(Ljava/lang/Object;Ljava/lang/ref/Cleaner;)V
<1aa00>     !j9method 0x0000000000153FC8   jdk/internal/ref/CleanerImpl$PhantomCleanableRef.<init>(Ljava/lang/Object;Ljava/lang/ref/Cleaner;Ljava/lang/Runnable;)V
<1aa00>     !j9method 0x0000000000153458   java/lang/ref/Cleaner.register(Ljava/lang/Object;Ljava/lang/Runnable;)Ljava/lang/ref/Cleaner$Cleanable;
<1aa00>     !j9method 0x0000000000199358   java/lang/invoke/MethodHandleNatives$CallSiteContext.make(Ljava/lang/invoke/CallSite;)Ljava/lang/invoke/MethodHandleNatives$CallSiteContext;
<1aa00>     !j9method 0x0000000000173C98   java/lang/invoke/CallSite.<init>(Ljava/lang/invoke/MethodHandle;)V
<1aa00>     !j9method 0x000000000019F418   java/lang/invoke/ConstantCallSite.<init>(Ljava/lang/invoke/MethodHandle;)V
<1aa00>     !j9method 0x0000000000199620   java/lang/invoke/InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite;
<1aa00>     !j9method 0x000000000015F568   java/lang/invoke/LambdaMetafactory.metafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;
<1aa00>     !j9method 0x000000000019B9A8   java/lang/invoke/LambdaForm$DMH/0x0000000064206f10.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<1aa00>     !j9method 0x000000000019C7B8   java/lang/invoke/LambdaForm$MH/0x0000000064207eb0.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<1aa00>     !j9method 0x00000000001952E8   java/lang/invoke/BootstrapMethodInvoker.invoke(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object;
<1aa00>     !j9method 0x0000000000173EB8   java/lang/invoke/CallSite.makeSite(Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/invoke/CallSite;
<1aa00>     !j9method 0x00000000000B0FE8   java/lang/invoke/MethodHandleNatives.linkCallSiteImpl(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName;
<1aa00>     !j9method 0x00000000000B0FC8   java/lang/invoke/MethodHandleNatives.linkCallSite(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName;
<1aa00>     !j9method 0x00000000000D3EF8   java/lang/invoke/MethodHandleResolver.resolveInvokeDynamic(JLjava/lang/String;Ljava/lang/String;J)Ljava/lang/Object;
<1aa00>                             JNI call-in frame
<1aa00>                             Generic special frame
<1aa00>     !j9method 0x000000000028D898   org/eclipse/openj9/criu/CRIUSupport.clearInetAddressCache()V
<1aa00>     !j9method 0x00000000002F5D48   org/eclipse/openj9/criu/CRIUSupport$$Lambda/0x0000000000000000.run()V
<1aa00>     !j9method 0x00000000002F6CB8   org/eclipse/openj9/criu/J9InternalCheckpointHookAPI$J9InternalCheckpointHook.runHook()V
<1aa00>     !j9method 0x00000000002F6728   org/eclipse/openj9/criu/J9InternalCheckpointHookAPI.runHooks(Ljava/util/List;Z)V
<1aa00>     !j9method 0x00000000002F6768   org/eclipse/openj9/criu/J9InternalCheckpointHookAPI.runPostRestoreHooksSingleThread()V
<1aa00>                             JNI call-in frame
<1aa00>     !j9method 0x000000000028D4B8   org/eclipse/openj9/criu/CRIUSupport.checkpointJVMImpl(Ljava/lang/String;ZZZILjava/lang/String;ZLjava/lang/String;ZZZZLjava/lang/String;Ljava/lang/String;)V
<1aa00>     !j9method 0x000000000028D8B8   org/eclipse/openj9/criu/CRIUSupport.checkpointJVM()V
<1aa00>     !j9method 0x0000000000282D68   org/openj9/criu/CRIUTestUtils.checkPointJVM(Lorg/eclipse/openj9/criu/CRIUSupport;Ljava/nio/file/Path;Z)V
<1aa00>     !j9method 0x0000000000282448   org/openj9/criu/OptionsFileTest.dumpOptionsTestRequireDynamic()V
<1aa00>     !j9method 0x00000000002822E8   org/openj9/criu/OptionsFileTest.main([Ljava/lang/String;)V
<1aa00>                             JNI call-in frame
<1aa00>                             Native method frame
> !stackslots 0x0001aa00   
...
<1aa00> Bytecode frame: bp = 0x0000000000146D68, sp = 0x0000000000146D58, pc = 0x00007FC8F55DF67A, cp = 0x0000000000154320, arg0EA = 0x0000000000146D80, flags = 0x0000000000000000
<1aa00>     Method: jdk/internal/ref/PhantomCleanable.insert()V !j9method 0x0000000000154500
<1aa00>     Bytecode index = 6
<1aa00>     Using local mapper
<1aa00>     Locals starting at 0x0000000000146D80 for 0x0000000000000003 slots
<1aa00>         O-Slot: a0[0x0000000000146D80] = 0x00000007062228E0
<1aa00>         O-Slot: t1[0x0000000000146D78] = 0x0000000706110298
<1aa00>         I-Slot: t2[0x0000000000146D70] = 0x0000000000146DB0
...
> !bytecodes 0x0000000000154500
...
    0 aload0getfield 
    1 getfield 14 jdk/internal/ref/PhantomCleanable.list Ljdk/internal/ref/PhantomCleanable;
    4 dup 
    5 astore1 
    6 monitorenter <-----
    7 aload0 
    8 aload0getfield 
    9 getfield 14 jdk/internal/ref/PhantomCleanable.list Ljdk/internal/ref/PhantomCleanable;
   12 putfield 10 jdk/internal/ref/PhantomCleanable.prev Ljdk/internal/ref/PhantomCleanable;
   15 aload0 
   16 aload0getfield 
   17 getfield 14 jdk/internal/ref/PhantomCleanable.list Ljdk/internal/ref/PhantomCleanable;
   20 getfield 12 jdk/internal/ref/PhantomCleanable.next Ljdk/internal/ref/PhantomCleanable;
   23 putfield 12 jdk/internal/ref/PhantomCleanable.next Ljdk/internal/ref/PhantomCleanable;
   26 aload0getfield 
   27 getfield 12 jdk/internal/ref/PhantomCleanable.next Ljdk/internal/ref/PhantomCleanable;
   30 aload0 
   31 putfield 10 jdk/internal/ref/PhantomCleanable.prev Ljdk/internal/ref/PhantomCleanable;
   34 aload0getfield 
   35 getfield 14 jdk/internal/ref/PhantomCleanable.list Ljdk/internal/ref/PhantomCleanable;
   38 aload0 
   39 putfield 12 jdk/internal/ref/PhantomCleanable.next Ljdk/internal/ref/PhantomCleanable;
   42 aload1 
   43 monitorexit 
   44 goto 52
   47 astore2 
   48 aload1 
   49 monitorexit 
   50 aload2 
   51 athrow 
   52 return0 
...
(gdb) where
...
#13 0x00007fc96a99114d in setCRIUSingleThreadModeJVMCRIUException (vmThread=0x1aa00, moduleName=<optimized out>, messageNumber=<optimized out>) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/exceptionsupport.c:1312
#14 0x00007fc96aab8c83 in VM_CRIUBytecodeInterpreterCompressed::run (this=0x7fc96aca0b00, vmThread=0x0) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/BytecodeInterpreter.hpp:11056
#15 0x00007fc96aab88bd in criuBytecodeLoopCompressed (currentThread=<optimized out>) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/BytecodeInterpreter.inc:112
#16 0x00007fc96ab02c22 in c_cInterpreter () at /root/openj9_issues_18399/openj9-openjdk-jdk21/build/linux-x86_64-server-release/vm/runtime/vm/xcinterp.s:158
#17 0x00007fc96a97aab1 in sendResolveInvokeDynamic (currentThread=currentThread@entry=0x1aa00, ramCP=ramCP@entry=0x28c6d0, callSiteIndex=callSiteIndex@entry=11, nameAndSig=0x706112378, nameAndSig@entry=0x7fc8f57459e8, bsmData=0x7fc8f5746c84)
    at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/callin.cpp:1120
#18 0x00007fc96a9d6c85 in resolveInvokeDynamic (vmThread=0x1aa00, ramCP=0x28c6d0, callSiteIndex=11, resolveFlags=0) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/resolvesupport.cpp:2233
#19 0x00007fc96aad4d4d in VM_CRIUBytecodeInterpreterCompressed::invokedynamic (_pc=<optimized out>, _sp=<optimized out>, this=<optimized out>) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/BytecodeInterpreter.hpp:8899
#20 VM_CRIUBytecodeInterpreterCompressed::run (this=0x7fc96aca13c0, vmThread=0x0) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/BytecodeInterpreter.hpp:11708
#21 0x00007fc96aab88bd in criuBytecodeLoopCompressed (currentThread=<optimized out>) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/BytecodeInterpreter.inc:112
#22 0x00007fc96ab02c22 in c_cInterpreter () at /root/openj9_issues_18399/openj9-openjdk-jdk21/build/linux-x86_64-server-release/vm/runtime/vm/xcinterp.s:158
#23 0x00007fc96a9786bf in runStaticMethod (currentThread=currentThread@entry=0x1aa00, className=className@entry=0x7fc96ab8e982 <j9InternalCheckpointHookAPI_name+2> "org/eclipse/openj9/criu/J9InternalCheckpointHookAPI", selector=selector@entry=0x7fc96aca1720, 
    argCount=argCount@entry=0, arguments=arguments@entry=0x0) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/callin.cpp:721
#24 0x00007fc96a9f535e in jvmRestoreHooks (currentThread=0x1aa00) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/vm/CRIUHelpers.cpp:98
#25 0x00007fc94e204a8d in Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl (env=<optimized out>, unused=<optimized out>, imagesDir=<optimized out>, leaveRunning=<optimized out>, shellJob=<optimized out>, extUnixSupport=<optimized out>, logLevel=0, logFile=0x0, 
    fileLocks=1 '\001', workDir=0x0, tcpEstablished=0 '\000', autoDedup=0 '\000', trackMemory=0 '\000', unprivileged=0 '\000', optionsFile=0x1473b0, environmentFile=0x0) at /root/openj9_issues_18399/openj9-openjdk-jdk21/openj9/runtime/criusupport/criusupport.cpp:1026
...
singh264 commented 11 months ago

@tajila How could a blocking operation be supported for CRIU single thread mode during the CRIUSupport.clearInetAddressCache() post restore hook when MethodHandleResolver.resolveInvokeDynamic(..) is called before CRIU single thread mode is disabled in Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl(..)?

tajila commented 11 months ago

@tajila How could a blocking operation be supported for CRIU single thread mode during the CRIUSupport.clearInetAddressCache()

We dont support blockign operations in single thread mode. The way to address this is to determine which thread owns the contended resource, then make sure that thread is not halted while holding the resource with @NotCheckpointSafe annotation which is applied to methods.

pshipton commented 11 months ago

Although I put this into the 0.43 milestone, we also get "Blocking operation is not allowed in CRIU single thread mode." failures in jdk21 builds.

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/98/ cmdLineTester_criu_nonPortableRestore_6

Testing: Properties test2
Test start time: 2023/11/15 00:18:52 Coordinated Universal Time
Running command: bash /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java " -Xgcpolicy:optavgpause " org.openj9.criu.OptionsFileTest PropertiesTest2 1
Time spent starting: 6 milliseconds
Time spent executing: 3121 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java -XX:+EnableCRIUSupport  -Xgcpolicy:optavgpause  -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criu.jar org.openj9.criu.OptionsFileTest PropertiesTest2 1
 [OUT] Pre-checkpoint
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Wed Nov 15 00:18:52 UTC 2023, System.currentTimeMillis(): 1700007532532, System.nanoTime(): 3392362584312458
 [OUT] JVMDUMP034I User requested Java dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_17000049215782/cmdLineTester_criu_nonPortableRestore_6/javacore.20231115.001852.901499.0001.txt' through CRIUSingleThreadModeJVMCRIUException
 [OUT] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_17000049215782/cmdLineTester_criu_nonPortableRestore_6/javacore.20231115.001852.901499.0001.txt
 [OUT] java.lang.reflect.InvocationTargetException
 [OUT] JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
 [OUT] JVMDUMP034I User requested Java dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_17000049215782/cmdLineTester_criu_nonPortableRestore_6/javacore.20231115.001855.901499.0002.txt' through CRIUSingleThreadModeJVMCRIUException
 [OUT] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_17000049215782/cmdLineTester_criu_nonPortableRestore_6/javacore.20231115.001855.901499.0002.txt
 [OUT] Exception in thread "main" java.lang.BootstrapMethodError: bootstrap method initialization exception
 [OUT]  at java.base/java.lang.invoke.BootstrapMethodInvoker.invoke(BootstrapMethodInvoker.java:188)
 [OUT]  at java.base/java.lang.invoke.CallSite.makeSite(CallSite.java:316)
 [OUT]  at java.base/java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:274)
 [OUT]  at java.base/java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:264)
 [OUT]  at java.base/java.lang.invoke.MethodHandleResolver.resolveInvokeDynamic(MethodHandleResolver.java:200)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.clearInetAddressCache(CRIUSupport.java:757)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.J9InternalCheckpointHookAPI$J9InternalCheckpointHook.runHook(J9InternalCheckpointHookAPI.java:143)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.J9InternalCheckpointHookAPI.runHooks(J9InternalCheckpointHookAPI.java:98)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.J9InternalCheckpointHookAPI.runPostRestoreHooksSingleThread(J9InternalCheckpointHookAPI.java:115)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:812)
 [OUT]  at org.openj9.criu.CRIUTestUtils.checkPointJVM(CRIUTestUtils.java:77)
 [OUT]  at org.openj9.criu.OptionsFileTest.propertiesTest2(OptionsFileTest.java:120)
 [OUT]  at org.openj9.criu.OptionsFileTest.main(OptionsFileTest.java:42)
 [OUT] Caused by: java.lang.invoke.LambdaConversionException: Exception instantiating lambda object
 [OUT]  at java.base/java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite(InnerClassLambdaMetafactory.java:243)
 [OUT]  at java.base/java.lang.invoke.LambdaMetafactory.metafactory(LambdaMetafactory.java:341)
 [OUT]  at java.base/java.lang.invoke.BootstrapMethodInvoker.invoke(BootstrapMethodInvoker.java:134)
 [OUT]  ... 13 more
 [OUT] Caused by: org.eclipse.openj9.criu.JVMRestoreException: Blocking operation is not allowed in CRIU single thread mode.
 [OUT]  at java.base/jdk.internal.ref.PhantomCleanable.insert(PhantomCleanable.java:87)
 [OUT]  at java.base/jdk.internal.ref.PhantomCleanable.<init>(PhantomCleanable.java:68)
 [OUT]  at java.base/jdk.internal.ref.CleanerImpl$PhantomCleanableRef.<init>(CleanerImpl.java:164)
 [OUT]  at java.base/java.lang.ref.Cleaner.register(Cleaner.java:224)
 [OUT]  at java.base/java.lang.invoke.MethodHandleNatives$CallSiteContext.make(MethodHandleNatives.java:86)
 [OUT]  at java.base/java.lang.invoke.CallSite.<init>(CallSite.java:145)
 [OUT]  at java.base/java.lang.invoke.ConstantCallSite.<init>(ConstantCallSite.java:50)
 [OUT]  at java.base/java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite(InnerClassLambdaMetafactory.java:236)
 [OUT]  ... 15 more
 [OUT] Removed test output files
 [OUT] finished script
 [ERR] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 41: 901499 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
>> Success condition was found: [Output match: Killed]
>> Failure condition was not found: [Output match: failed properties test]
>> Required condition was found: [Output match: Pre-checkpoint]
>> Success condition was not found: [Output match: Post-checkpoint]
>> Failure condition was not found: [Output match: java.lang.NullPointerException]
>> Failure condition was found: [Output match: org.eclipse.openj9.criu.JVMRestoreException]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]
>> Failure condition was found: [Output match: User requested Java dump using]

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/98/ cmdLineTester_criu_nonPortableRestore_0

Testing: Envvar test1
Test start time: 2023/11/15 01:35:24 Coordinated Universal Time
Running command: bash /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java " -Xjit -XX:+CRIURestoreNonPortableMode " org.openj9.criu.EnvVarFileTest EnvVarFileTest1 1
Time spent starting: 1 milliseconds
Time spent executing: 3959 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java -XX:+EnableCRIUSupport  -Xjit -XX:+CRIURestoreNonPortableMode  -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criu.jar org.openj9.criu.EnvVarFileTest EnvVarFileTest1 1
 [OUT] Pre-checkpoint
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Wed Nov 15 01:35:25 UTC 2023, System.currentTimeMillis(): 1700012125449, System.nanoTime(): 1700012125446636740
 [OUT] JVMDUMP034I User requested Java dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/output_17000100448258/cmdLineTester_criu_nonPortableRestore_0/javacore.20231115.013525.728142.0001.txt' through CRIUSingleThreadModeJVMCRIUException
 [OUT] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/output_17000100448258/cmdLineTester_criu_nonPortableRestore_0/javacore.20231115.013525.728142.0001.txt
 [OUT] java.lang.reflect.InvocationTargetException
 [OUT] JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
 [OUT] JVMDUMP034I User requested Java dump using '/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/output_17000100448258/cmdLineTester_criu_nonPortableRestore_0/javacore.20231115.013528.728142.0002.txt' through CRIUSingleThreadModeJVMCRIUException
 [OUT] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/output_17000100448258/cmdLineTester_criu_nonPortableRestore_0/javacore.20231115.013528.728142.0002.txt
 [OUT] Exception in thread "main" org.eclipse.openj9.criu.JVMRestoreException: Failed to setup new environment variables
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.throwSetEnvException(CRIUSupport.java:743)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.lambda$registerRestoreEnvVariables$4(CRIUSupport.java:737)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.J9InternalCheckpointHookAPI$J9InternalCheckpointHook.runHook(J9InternalCheckpointHookAPI.java:143)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.J9InternalCheckpointHookAPI.runHooks(J9InternalCheckpointHookAPI.java:98)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.J9InternalCheckpointHookAPI.runPostRestoreHooksSingleThread(J9InternalCheckpointHookAPI.java:115)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:812)
 [OUT]  at org.openj9.criu.CRIUTestUtils.checkPointJVM(CRIUTestUtils.java:77)
 [OUT]  at org.openj9.criu.EnvVarFileTest.envVarFileTest1(EnvVarFileTest.java:102)
 [OUT]  at org.openj9.criu.EnvVarFileTest.main(EnvVarFileTest.java:39)
 [OUT] Caused by: org.eclipse.openj9.criu.JVMRestoreException: Blocking operation is not allowed in CRIU single thread mode.
 [OUT]  at java.base/jdk.internal.ref.PhantomCleanable.insert(PhantomCleanable.java:87)
 [OUT]  at java.base/jdk.internal.ref.PhantomCleanable.<init>(PhantomCleanable.java:68)
 [OUT]  at java.base/java.io.FileCleanable.<init>(FileCleanable.java:100)
 [OUT]  at java.base/java.io.FileCleanable.register(FileCleanable.java:77)
 [OUT]  at java.base/java.io.FileInputStream.<init>(FileInputStream.java:153)
 [OUT]  at java.base/java.io.FileInputStream.<init>(FileInputStream.java:106)
 [OUT]  at java.base/java.io.FileReader.<init>(FileReader.java:60)
 [OUT]  at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.lambda$registerRestoreEnvVariables$4(CRIUSupport.java:675)
 [OUT]  ... 8 more
 [OUT] Removed test output files
 [OUT] finished script
 [ERR] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 41: 728142 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
>> Success condition was found: [Output match: Killed]
>> Failure condition was not found: [Output match: failed properties test]
>> Required condition was found: [Output match: Pre-checkpoint]
>> Success condition was not found: [Output match: Post-checkpoint]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]
>> Failure condition was found: [Output match: User requested Java dump using]
JasonFengJ9 commented 11 months ago

Resolved via https://github.com/ibmruntimes/openj9-openjdk-jdk21/pull/69

hzongaro commented 11 months ago

Also encountered this "Blocking operation is not allowed in CRIU single thread mode" in a pull request test of JDK 17 on aarch64: https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_linux_Personal/369/

tajila commented 11 months ago

@singh264 Please take a look

singh264 commented 10 months ago

This issue can be closed and it was fixed by:

The changes for JDK21 and JDK17 are in v0.43.0-release as well.

babsingh commented 10 months ago

Closing as per https://github.com/eclipse-openj9/openj9/issues/18399#issuecomment-1843134336.