Closed JamesKingdon closed 1 year ago
@SajinaKandy Could you please look into this problem?
More info for reproducing: the system had 1GB large pages enabled. This page shows how to enable large pages: https://www.intel.com/content/www/us/en/docs/programmable/683840/1-2-1/enabling-hugepages.html
Relevant comments from the case: The latest dump narrows down the problem to the JIT component. One of the new tracepoints added by the JVM team only fires when there's a problem, and there are three instances of this tracepoint in the dump's trace buffers. All of them are very similar:
14:29:04.286204531 0x291000 j9jcl.533 Exception * JCL: processNotificationLoop memoryUsageError: GcName=global, MemoryPool=JIT code cache, initialSize=1073741824, preUsedSize=1312896, preCommittedSize==1073741824, preMaxSize=268435456, postUsedSize=1312896, postCommittedSize==1073741824, postMaxSize=268435456
Note that the pre/post GC committed sizes for the JIT code cache are larger than the maximum sizes, which doesn't make sense:
preCommittedSize = 1073741824
preMaxSize = 268435456
postCommittedSize = 1073741824
postMaxSize = 268435456
This is the cause of the IllegalArgumentException later that triggered the dump:
java.lang.IllegalArgumentException: committed value cannot be larger than the max value
Tracepoints capturing the exception:
14:29:04.286175346 *0x0 j9mm.475 Event GlobalGC end: workstackoverflow=0 overflowcount=0 memory=13903136/37945344
14:29:04.286202934 *0x291000 j9jcl.534 Event JCL: processNotificationLoop memoryUsage: GcName=global, MemoryPool=tenured-SOA, initialSize=5976064, preUsedSize=21538192, preCommittedSize==22793216, preMaxSize=12109836288, postUsedSize=21104312, postCommittedSize==28398592, postMaxSize=12109836288
14:29:04.286203371 0x291000 j9jcl.534 Event JCL: processNotificationLoop memoryUsage: GcName=global, MemoryPool=tenured-LOA, initialSize=315392, preUsedSize=0, preCommittedSize==1455104, preMaxSize=772968448, postUsedSize=0, postCommittedSize==1813504, postMaxSize=772968448
14:29:04.286203608 0x291000 j9jcl.534 Event JCL: processNotificationLoop memoryUsage: GcName=global, MemoryPool=nursery-allocate, initialSize=1048576, preUsedSize=434096, preCommittedSize==5111808, preMaxSize=2129285120, postUsedSize=316456, postCommittedSize==5111808, postMaxSize=2129285120
14:29:04.286203815 0x291000 j9jcl.534 Event JCL: processNotificationLoop memoryUsage: GcName=global, MemoryPool=nursery-survivor, initialSize=1048576, preUsedSize=2621440, preCommittedSize==2621440, preMaxSize=1091940352, postUsedSize=2621440, postCommittedSize==2621440, postMaxSize=1091940352
14:29:04.286204095 0x291000 j9jcl.534 Event JCL: processNotificationLoop memoryUsage: GcName=global, MemoryPool=class storage, initialSize=0, preUsedSize=28502208, preCommittedSize==28997792, preMaxSize=-1, postUsedSize=28502208, postCommittedSize==28997792, postMaxSize=-1
14:29:04.286204360 0x291000 j9jcl.534 Event JCL: processNotificationLoop memoryUsage: GcName=global, MemoryPool=miscellaneous non-heap storage, initialSize=17825792, preUsedSize=7360992, preCommittedSize==22020096, preMaxSize=-1, postUsedSize=7360992, postCommittedSize==22020096, postMaxSize=-1
14:29:04.286204531 0x291000 j9jcl.533 Exception * JCL: processNotificationLoop memoryUsageError: GcName=global, MemoryPool=JIT code cache, initialSize=1073741824, preUsedSize=1312896, preCommittedSize==1073741824, preMaxSize=268435456, postUsedSize=1312896, postCommittedSize==1073741824, postMaxSize=268435456
14:29:04.286204705 0x291000 j9jcl.534 Event JCL: processNotificationLoop memoryUsage: GcName=global, MemoryPool=JIT code cache, initialSize=1073741824, preUsedSize=1312896, preCommittedSize==1073741824, preMaxSize=268435456, postUsedSize=1312896, postCommittedSize==1073741824, postMaxSize=268435456
14:29:04.286204889 0x291000 j9jcl.534 Event JCL: processNotificationLoop memoryUsage: GcName=global, MemoryPool=JIT data cache, initialSize=0, preUsedSize=2097152, preCommittedSize==2097152, preMaxSize=402653184, postUsedSize=2097152, postCommittedSize==2097152, postMaxSize=402653184
14:29:04.286205881 0x291000 j9mm.258 Exit <MM_ConcurrentGC_internalPostCollect subspace=0x7f0d7c1e8060
14:29:04.286207918 0x291000 j9mm.75 Event Concurrent collection end: newspace=4795352/7733248 oldspace=9107784/30212096 loa=1813504/1813504
14:29:04.286208200 *0x0 j9mm.75 Event Concurrent collection end: newspace=4795352/7733248 oldspace=9107784/30212096 loa=1813504/1813504
When large page is enabled the code cache allocation happens as a multiple of the available page size.
codeCacheSizeToAllocate = (codeCacheSizeToAllocate + (vmemParams.pageSize-1)) & (~(vmemParams.pageSize-1));
https://github.com/eclipse-openj9/openj9/blob/3060a8d8c95d628e78b1a77ac477c59e49db09f6/runtime/compiler/runtime/J9CodeCacheManager.cpp#L329
if (largeCodePageSize > 0)
#endif
{
vmemParams.pageSize = largeCodePageSize;
With large page size of 2M default allocation:
#CODECACHE: TR::CodeCache::allocate : requesting 268435456 bytes
#CODECACHE: TR::CodeCache::allocate : largeCodePageSize = 2097152
With large page size of 2M and -Xcodecachetotal1M -Xcodecache1M
#CODECACHE: TR::CodeCache::allocate : requesting 2097152 bytes
#CODECACHE: TR::CodeCache::allocate : largeCodePageSize = 2097152
In case large page is enabled we should modify the codeCacheTotalKB
accordingly to avoid the exception.
Hi , May in know when this fix will be available?
Thanks
@mukul4u2005 Currently its targeted for the Jan refresh as detailed here.
I could enable 1G pages on my home system and I can test this:
-Xlp:codecache:pagesize=1G large page size for JIT code cache
available large page sizes for JIT code cache:
4K
1G
The issue was fixed by https://github.com/eclipse-openj9/openj9/pull/18342 FYI: @mukul4u2005, since you were interested in this fix. The fix is available for testing purposes in the nightly builds (e.g. https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK17_x86-64_linux_Nightly/) and will be officially released and documented with OpenJ9 release 0.43 which will appear sometime in January 2024. In the absence of a fix, any of the following workarounds can be used:
Workaround (2) may reduce throughput somewhat. Workaround (3) may increase footprint (up to 1 GB)
Thank you Sajina and Marius, much appreciated.
@mpirvu Thanks for update!
Will it be part of openj9 Jre 11 release also?
Will it be part of openj9 Jre 11 release also?
Yes, all supported versions of java (including Java11) will have the fix.
Problem was reported on Java 11.0.21
The exception text is
Investigation indicates that the JIT allocated a 1G segment for the code cache which is reported for the 'committed' memory value, while the max value is 256MB.