Closed pshipton closed 1 year ago
@dmitripivkine
* In which case, _workerThreadsReservedForGC is set but the task runs single threaded and exiting threads are dying instead of reserved. */
Assert_MM_true((worker_status_reserved == _statusTable[workerID]) || ((0 == _threadsToReserve) && (worker_status_dying == _statusTable[workerID])));
> !mm_paralleldispatcher 0x0000724A74069950
MM_ParallelDispatcher at 0x724a74069950 {
Fields for MM_Base:
Fields for MM_BaseVirtual:
0x8: const U8* _typeId = !j9x 0x0000724A7898F8C8 // "MM_ParallelDispatcher"
Fields for MM_ParallelDispatcher:
0x10: class MM_Task* _task = !mm_concurrentglobalmarktask 0x0000724A5A28BE58
0x18: class MM_GCExtensionsBase* _extensions = !mm_gcextensions 0x0000724A740557C0
0x20: U64 _threadShutdownCount = 0x0000000000000001 (1)
0x28: struct J9Thread** _threadTable = !j9x 0x0000724A74069A30
0x30: U64* _statusTable = !j9x 0x0000724A74069AA0
0x38: void** _taskTable = !j9x 0x0000724A74069B10
0x40: struct J9ThreadMonitor* _workerThreadMutex = !j9threadmonitor 0x0000724A7401BBC8
0x48: struct J9ThreadMonitor* _dispatcherMonitor = !j9threadmonitor 0x0000724A7401BC78
0x50: struct J9ThreadMonitor* _synchronizeMutex = !j9threadmonitor 0x0000724A7401BD28
0x58: bool _workerThreadsReservedForGC = true
0x59: bool _inShutdown = true
0x60: U64 _threadCountMaximum = 0x0000000000000004 (4)
0x68: U64 _threadCount = 0x0000000000000001 (1)
0x70: U64 _activeThreadCount = 0x0000000000000004 (4)
0x78: U64 _threadsToReserve = 0x0000000000000003 (3) <-----
0x80: void* _handler = !j9x 0x0000724A7925FD80
0x88: void* _handler_arg = !j9x 0x0000724A74028400
0x90: U64 _defaultOSStackSize = 0x0000000000080000 (524288)
}
0x10: U64 _workerID = 0x0000000000000002 (2) <--------------
_statusTable = !j9x 0x0000724A74069AA0
0x724A74069AA0 : 0000000000000003 0000000000000004 [ ................ ]
0x724A74069AB0 : 0000000000000004 0000000000000004 [ ................ ]
enum {
worker_status_inactive = 0, /* Must be 0 - set at initialization time by memset */
worker_status_waiting,
worker_status_reserved, <--- 2
worker_status_active,
worker_status_dying <--- 4
};
_statusTable[workerID]
is worker_status_dying
(not worker_status_reserved)
_threadsToReserve
is 3
So, code expects _threadsToReserve
to be 0 but it is not
@RSalman FYI
This is similar to an issue encountered in the past with a race between thread shutdown and task dispatch, this has been extensively detailed https://github.com/eclipse-openj9/openj9/issues/10491#issuecomment-686057277. I've revisited the recent dispatcher changes and I can confirm that this isn't a regression, I believe its been around since Dispatcher changes back in 2020.
This failure exposes a timing hole where thread shutdown and task dispatch overlap and result in unexpected Dispatcher state, where task dispatch attempts to reserve the same set of threads that are being shutdown. The exact scenario is detailed in the fix: https://github.com/eclipse/omr/pull/6954.
The following holds true for this issue as well:
Reproducing this scenario is a challenge, to have GC during shutdown + ensuring that main thread acquires mutex before worker threads are all woken up and terminated. With 1000s of iterations on multiple platforms/machines + varying heap configuration, I wasn't able to reproduce this issue. Luckily the core dump/trace points is sufficient to figure out what's happening.
I don't think its worth the effort to reproduce this issue, its not trivial to do.
I was able to reproduce the issue and verify the fix by
java -Xgc:concurrentscavenge -Xmn4M -cp /salman.rana/vich.jar vich.runner.Menu 14
Reproducibility is ~ 1/10. Verified fix with ~1000 itrs
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_ppc64le_linux_aot_Personal_testList_0/118 cmdLineTester_GCRegressionTests_2
-Xshareclasses:name=test_aot -Xscmx400M -Xscmaxaot256m -XX:+UseCompressedOops -Xjit -Xgcpolicy:balanced
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_ppc64le_linux_aot_Personal_testList_0/118/functional_test_output.tar.gz