Closed JasonFengJ9 closed 1 year ago
@RSalman Please take a look.
There's a GC triggered while in the middle of shutting down GC threads during checkpoint. Dispatcher is attempting to use threads that are being shutdown. The code was written with assumption that a GC wouldn't be triggered during checkpoint. There was a suspension this could occur while discussing https://github.com/eclipse-openj9/openj9/pull/16653 (had discussion offline with @amicic @dmitripivkine). A potential workaround is to force single threaded task dispatch during thread pool contract, similar to what happens when there's GC during VM shutdown.
02:24:30.223895439 *0x0000000000000000 j9mm.51 Event SystemGC end: newspace=124632/2359296 oldspace=5897000/6291456 loa=315392/315392
02:24:30.223898584 *0x0000000002092500 j9vm.372 Entry >Releasing exclusive VM Access
02:24:30.223899749 0x0000000002092500 j9vm.375 Event Exclusive VM Access queue is empty, resetting exclusive access state and notifying all halted threads. Changing exclusiveAccessState to J9_XACCESS_NONE.
02:24:30.223935814 0x0000000002092500 j9vm.376 Exit <Released exclusive VM Access
02:24:30.224487049 0x0000000002092500 j9mm.771 Entry >contractThreadPool Entry: gcThreadCount: 64, requested newThreadCount: 1
02:24:30.224487397 0x0000000002092500 j9mm.772 Event Attempt to shutdown GC threads
02:24:30.228115489 *0x0000000002220400 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228123206 *0x000000000222c100 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228126662 *0x0000000002230c00 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228129556 *0x0000000002233200 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228133709 *0x0000000002262400 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228143352 *0x0000000002229b00 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228159806 *0x00000000021f3700 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228175144 *0x00000000021ea000 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228175692 *0x0000000002258d00 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228180082 *0x00000000021f1200 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228184674 *0x0000000002208b00 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.228188448 *0x00000000021f8300 j9vm.544 Event JNIinv DetachCurrentThread
02:24:30.229465075 *0x000000000225fe00 j9mm.107 Assert * ** ASSERTION FAILED ** at ../../../../../../omr/gc/base/ParallelDispatcher.cpp:176: ((false && ((worker_status_reserved == _statusTable[workerID]) || ((0 == _threadsToReserve) && (worker_status_dying == _statusTable[workerID])))))
> !mm_paralleldispatcher 0x000003FF8005F5C0
MM_ParallelDispatcher at 0x3ff8005f5c0 {
Fields for MM_Base:
Fields for MM_BaseVirtual:
0x8: const U8* _typeId = !j9x 0x000003FF8447F3EA // "MM_ParallelDispatcher"
Fields for MM_ParallelDispatcher:
0x10: class MM_Task* _task = !mm_parallelscavengetask 0x000003FF7FFFCD10
...
0x20: U64 _threadShutdownCount = 0x0000000000000020 (32)
0x28: struct J9Thread** _threadTable = !j9x 0x000003FF8005F6B0
0x30: U64* _statusTable = !j9x 0x000003FF8005F900
0x38: void** _taskTable = !j9x 0x000003FF8005FB50
... 0x58: bool _workerThreadsReservedForGC = true
0x59: bool _inShutdown = true
0x60: U64 _threadCountMaximum = 0x0000000000000040 (64)
0x68: U64 _threadCount = 0x0000000000000040 (64)
0x70: U64 _activeThreadCount = 0x0000000000000040 (64)
0x78: U64 _threadsToReserve = 0x000000000000003F (63)
.. 0x98: U64 _poolMaxCapacity = 0x0000000000000040 (64)
}
I suppose JNIinv DetachCurrentThread
is triggering a GC?
@JasonFengJ9 any idea how to avoid these during grinder runs:
10:30:32 [OUT] Exception in thread "main" org.eclipse.openj9.criu.SystemCheckpointException: The JVM attempted to load libcriu.so but was unable to: 1
10:30:32 [OUT] at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
10:30:32 [OUT] at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:658)
10:30:32 [OUT] at org.openj9.criu.CRIUTestUtils.checkPointJVM(CRIUTestUtils.java:77)
10:30:32 [OUT] at org.openj9.criu.CRIUTestUtils.checkPointJVM(CRIUTestUtils.java:65)
10:30:32 [OUT] at org.openj9.criu.TimeChangeTest.testSystemNanoTimeJitPreCheckpointCompile(TimeChangeTest.java:106)
10:30:32 [OUT] at org.openj9.criu.TimeChangeTest.main(TimeChangeTest.java:52)
any idea how to avoid these during grinder runs: The JVM attempted to load libcriu.so but was unable to: 1
This error is a known issue (I think it is machine setup related)
My workaround is to re-launch with more machines, or at the same machine where the failure was reported initinally.
@JasonFengJ9 any idea how to avoid these during grinder runs:
should only happen when the machine doesnt have criu installed
I suppose JNIinv DetachCurrentThread is triggering a GC?
Possibly, I've always wondered whether GC threads need to have java objects. I think thats the issue. Anything with a java object needs to run the thread cleanup code. As soon as you run java code there is always a chance a GC can be triggered.
@JasonFengJ9 could you please try out this fix and see if the issue shows up: https://github.com/eclipse/omr/pull/7012. Seems like you had some more luck with the grinder, I'm not getting any reproducibility.
Fairly confident this change fixes the issue. I've ran some grinders, no issues with the fix... but also no issue without the fix. So I can't defiantly say this fixes the issue.
could you please try out the fix and see if the issue shows up: https://github.com/eclipse/omr/pull/7012. Seems like you had some more luck with the grinder, I'm not getting any reproducibility.
Missed this notification, will give it a try.
Not able to reproduce the failure 200x grinder with https://github.com/eclipse/omr/pull/7012, and 200x grinder using a recent nightly build In addition, this assertion didn't appear in recent nightly builds either.
@RSalman I think https://github.com/eclipse/omr/pull/7012 can be merged. This issue could be re-opened if the failure occurs again.
Failure link
From an internal build(
ubu20s390x-svl-rt1-1
):Rerun in Grinder - Change TARGET to run only the failed test targets.
Optional info
Failure output (captured from console output)
50x internal grinder - 1/50 failed
FYI @tajila