Open WPCode opened 1 year ago
@LinHu2016
@WPCode The problem mentioned in https://www.ibm.com/support/pages/apar/IJ21959 should be fixed in your version ov Java 11. Do you have any other information from failure like javacore or GC verbose log? Which GC policy you are using? Unfortunately even with GC verbose log and javacore we might not have a clue. Would you accept instrumented JVM (in assumption it is reproducible)? BTW there is more than GC reported info there, for example JIT code cache is reported too.
We need to improve serviceability for cases like this. Currently it is really hard to find out which value is set wrong. I think (at least) we should add tracepoint (disabled by default) which can duplicate information returned by Memory MX Beans report to snap traces file. Also we can add another tracepoint (enabled by default) triggered by possible inconsistent output (warning). Adding these tracepoints will simplify debugging indeed.
Thanks Dmitri, I will add some tracepoints for the cases.
@dmitripivkine Here are some logs we found before the exception shows up, not sure if you could notice anything inside. BTW, how can we do the instrumented JVM? We couldn't reproduce in house. It would be great if you can provide some steps so we can build a custom image and try it at customer's environment. Thanks!
auth-668cdfb999-q26z6: Starting JVM...
auth-668cdfb999-q26z6: Memory limit is detected as 17179869184
auth-668cdfb999-q26z6: Executing startup command: "java -verbose:sizes -XX:+ExitOnOutOfMemoryError
auth-668cdfb999-q26z6: -XX:+ClassRelationshipVerifier -XX:+CompactStrings
auth-668cdfb999-q26z6: -XX:-HeapDumpOnOutOfMemoryError -Xdump:what -Xdump:heap:none -Xdump:java:file=/STDOUT/
auth-668cdfb999-q26z6: -Xdump:java:events=systhrow,filter=java/lang/OutOfMemoryError,file=/STDOUT/
auth-668cdfb999-q26z6: -XshowSettings -Djavax.xml.bind.JAXBContextFactory=com.sun.xml.bind.v2.ContextFactory
auth-668cdfb999-q26z6: -Djavax.xml.ws.spi.Provider=com.sun.xml.ws.spi.ProviderImpl
auth-668cdfb999-q26z6: -Djavax.xml.soap.SAAJMetaFactory=com.sun.xml.messaging.saaj.soap.SAAJMetaFactoryImpl
auth-668cdfb999-q26z6: -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true
auth-668cdfb999-q26z6: -Dnetworkaddress.cache.ttl=0 -Dnetworkaddress.cache.negative.ttl=0
auth-668cdfb999-q26z6: -Djava.security.properties=/etc/crypto-policies/back-ends/java.config -XX:MaxRAMPercentage=75.0 -jar auth-8.9.0.jar"
auth-668cdfb999-q26z6:
auth-668cdfb999-q26z6: Registered dump agents
auth-668cdfb999-q26z6: ----------------------
auth-668cdfb999-q26z6: -Xdump:system:
auth-668cdfb999-q26z6: events=gpf+abort+traceassert+corruptcache,
auth-668cdfb999-q26z6: file=//core.%Y%m%d.%H%M%S.%pid.%seq.dmp,
auth-668cdfb999-q26z6: range=1..0,
auth-668cdfb999-q26z6: priority=999,
auth-668cdfb999-q26z6: request=serial
auth-668cdfb999-q26z6: ----------------------
auth-668cdfb999-q26z6: -Xdump:java:
auth-668cdfb999-q26z6: events=gpf+user+abort+traceassert+corruptcache,
auth-668cdfb999-q26z6: file=/STDOUT/,
auth-668cdfb999-q26z6: range=1..0,
auth-668cdfb999-q26z6: priority=400,
auth-668cdfb999-q26z6: request=exclusive+preempt
auth-668cdfb999-q26z6: ----------------------
auth-668cdfb999-q26z6: -Xdump:java:
auth-668cdfb999-q26z6: events=systhrow,
auth-668cdfb999-q26z6: filter=java/lang/OutOfMemoryError,
auth-668cdfb999-q26z6: file=/STDOUT/,
auth-668cdfb999-q26z6: range=1..0,
auth-668cdfb999-q26z6: priority=400,
auth-668cdfb999-q26z6: request=exclusive+preempt
auth-668cdfb999-q26z6: ----------------------
auth-668cdfb999-q26z6: -Xdump:snap:
auth-668cdfb999-q26z6: events=gpf+abort+traceassert+corruptcache,
auth-668cdfb999-q26z6: file=//Snap.%Y%m%d.%H%M%S.%pid.%seq.trc,
auth-668cdfb999-q26z6: range=1..0,
auth-668cdfb999-q26z6: priority=300,
auth-668cdfb999-q26z6: request=serial
auth-668cdfb999-q26z6: ----------------------
auth-668cdfb999-q26z6: -Xdump:jit:
auth-668cdfb999-q26z6: events=gpf+abort,
auth-668cdfb999-q26z6: file=//jitdump.%Y%m%d.%H%M%S.%pid.%seq.dmp,
auth-668cdfb999-q26z6: range=1..0,
auth-668cdfb999-q26z6: priority=200,
auth-668cdfb999-q26z6: request=serial
auth-668cdfb999-q26z6: ----------------------
auth-668cdfb999-q26z6: -Xdump:exit:
auth-668cdfb999-q26z6: events=systhrow,
auth-668cdfb999-q26z6: filter=java/lang/OutOfMemoryError,
auth-668cdfb999-q26z6: ,
auth-668cdfb999-q26z6: range=1..0,
auth-668cdfb999-q26z6: priority=0,
auth-668cdfb999-q26z6: request=serial
auth-668cdfb999-q26z6: ----------------------
auth-668cdfb999-q26z6:
auth-668cdfb999-q26z6: -Xmca32K RAM class segment increment
auth-668cdfb999-q26z6: -Xmco128K ROM class segment increment
auth-668cdfb999-q26z6: -Xmcrs200M compressed references metadata initial size
auth-668cdfb999-q26z6: -Xmns2M initial new space size
auth-668cdfb999-q26z6: -Xmnx3G maximum new space size
auth-668cdfb999-q26z6: -Xms8M initial memory size
auth-668cdfb999-q26z6: -Xmos6M initial old space size
auth-668cdfb999-q26z6: -Xmox12286M maximum old space size
auth-668cdfb999-q26z6: -Xmx12G memory maximum
auth-668cdfb999-q26z6: -Xmr4K remembered set size
auth-668cdfb999-q26z6: -Xlp:objectheap:pagesize=4K large page size
auth-668cdfb999-q26z6: available large page sizes:
auth-668cdfb999-q26z6: 4K
auth-668cdfb999-q26z6: 1G
auth-668cdfb999-q26z6: -Xlp:codecache:pagesize=1G large page size for JIT code cache
auth-668cdfb999-q26z6: available large page sizes for JIT code cache:
auth-668cdfb999-q26z6: 4K
auth-668cdfb999-q26z6: 1G
auth-668cdfb999-q26z6: -Xmso256K operating system thread stack size
auth-668cdfb999-q26z6: -Xiss2K java thread stack initial size
auth-668cdfb999-q26z6: -Xssi16K java thread stack increment
auth-668cdfb999-q26z6: -Xss1M java thread stack maximum size
auth-668cdfb999-q26z6: -XX:SharedCacheHardLimit=0K shared class cache size
auth-668cdfb999-q26z6: -Xscmx0K shared class cache soft max size
auth-668cdfb999-q26z6: -Xscdmx0K reserved shared class cache space for class debug attributes
auth-668cdfb999-q26z6: -Xscminaot0K min reserved shared class cache space for AOT
auth-668cdfb999-q26z6: -Xscmaxaot0K max allowed shared class cache space for AOT
auth-668cdfb999-q26z6: -Xscminjitdata0K min reserved shared class cache space for JIT data
auth-668cdfb999-q26z6: -Xscmaxjitdata0K max allowed shared class cache space for JIT data
auth-668cdfb999-q26z6: VM settings:
auth-668cdfb999-q26z6: Max. Heap Size (Estimated): 12.00G
auth-668cdfb999-q26z6: Using VM: Eclipse OpenJ9 VM
auth-668cdfb999-q26z6:
auth-668cdfb999-q26z6: Property settings:
auth-668cdfb999-q26z6: awt.toolkit = sun.awt.X11.XToolkit
auth-668cdfb999-q26z6: com.ibm.cpu.endian = little
auth-668cdfb999-q26z6: com.ibm.jcl.checkClassPath =
auth-668cdfb999-q26z6: com.ibm.oti.configuration = scar
auth-668cdfb999-q26z6: com.ibm.oti.shared.enabled = false
auth-668cdfb999-q26z6: com.ibm.oti.vm.bootstrap.library.path = /usr/lib/jvm/ibm-semeru-open-11-jre/lib/default
auth-668cdfb999-q26z6: /usr/lib/jvm/ibm-semeru-open-11-jre/lib
auth-668cdfb999-q26z6: com.ibm.oti.vm.library.version = 29
auth-668cdfb999-q26z6: com.ibm.system.agent.path = /usr/lib/jvm/ibm-semeru-open-11-jre/lib
auth-668cdfb999-q26z6: com.ibm.util.extralibs.properties =
auth-668cdfb999-q26z6: com.ibm.vm.bitmode = 64
auth-668cdfb999-q26z6: file.encoding = ANSI_X3.4-1968
auth-668cdfb999-q26z6: file.encoding.pkg = sun.io
auth-668cdfb999-q26z6: file.separator = /
auth-668cdfb999-q26z6: ibm.signalhandling.rs = false
auth-668cdfb999-q26z6: ibm.signalhandling.sigchain = true
auth-668cdfb999-q26z6: ibm.signalhandling.sigint = true
auth-668cdfb999-q26z6: ibm.system.encoding = ANSI_X3.4-1968
auth-668cdfb999-q26z6: java.awt.fonts =
auth-668cdfb999-q26z6: java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
auth-668cdfb999-q26z6: java.awt.printerjob = sun.print.PSPrinterJob
auth-668cdfb999-q26z6: java.class.path = auth-8.9.0.jar
auth-668cdfb999-q26z6: java.class.version = 55.0
auth-668cdfb999-q26z6: java.compiler = j9jit29
auth-668cdfb999-q26z6: java.fullversion = 11.0.17+8
auth-668cdfb999-q26z6: JRE 11 Linux amd64-64-Bit Compressed References 20221031_559 (JIT enabled, AOT enabled)
auth-668cdfb999-q26z6: OpenJ9 - e04a7f6c1
auth-668cdfb999-q26z6: OMR - 85a21674f
auth-668cdfb999-q26z6: JCL - a94c231303 based on jdk-11.0.17+8
auth-668cdfb999-q26z6: java.home = /usr/lib/jvm/ibm-semeru-open-11-jre
auth-668cdfb999-q26z6: java.io.tmpdir = /tmp
auth-668cdfb999-q26z6: java.library.path = /usr/lib/jvm/ibm-semeru-open-11-jre/lib/default
auth-668cdfb999-q26z6: /usr/lib/jvm/ibm-semeru-open-11-jre/lib
auth-668cdfb999-q26z6: /usr/lib64
auth-668cdfb999-q26z6: /usr/lib
auth-668cdfb999-q26z6: java.net.preferIPv4Stack = true
auth-668cdfb999-q26z6: java.runtime.name = IBM Semeru Runtime Open Edition
auth-668cdfb999-q26z6: java.runtime.version = 11.0.17+8
auth-668cdfb999-q26z6: java.security.egd = file:/dev/./urandom
auth-668cdfb999-q26z6: java.security.properties = /etc/crypto-policies/back-ends/java.config
auth-668cdfb999-q26z6: java.specification.name = Java Platform API Specification
auth-668cdfb999-q26z6: java.specification.vendor = Oracle Corporation
auth-668cdfb999-q26z6: java.specification.version = 11
auth-668cdfb999-q26z6: java.vendor = IBM Corporation
auth-668cdfb999-q26z6: java.vendor.url = https://www.ibm.com/semeru-runtimes
auth-668cdfb999-q26z6: java.vendor.url.bug = https://github.com/ibmruntimes/Semeru-Runtimes/issues
auth-668cdfb999-q26z6: java.vendor.version = 11.0.17.0
auth-668cdfb999-q26z6: java.version = 11.0.17
auth-668cdfb999-q26z6: java.version.date = 2022-10-18
auth-668cdfb999-q26z6: java.vm.info = JRE 11 Linux amd64-64-Bit Compressed References 20221031_559 (JIT enabled, AOT enabled)
auth-668cdfb999-q26z6: OpenJ9 - e04a7f6c1
auth-668cdfb999-q26z6: OMR - 85a21674f
auth-668cdfb999-q26z6: JCL - a94c231303 based on jdk-11.0.17+8
auth-668cdfb999-q26z6: java.vm.name = Eclipse OpenJ9 VM
auth-668cdfb999-q26z6: java.vm.specification.name = Java Virtual Machine Specification
auth-668cdfb999-q26z6: java.vm.specification.vendor = Oracle Corporation
auth-668cdfb999-q26z6: java.vm.specification.version = 11
auth-668cdfb999-q26z6: java.vm.vendor = Eclipse OpenJ9
auth-668cdfb999-q26z6: java.vm.version = openj9-0.35.0
auth-668cdfb999-q26z6: javax.xml.bind.JAXBContextFactory = com.sun.xml.bind.v2.ContextFactory
auth-668cdfb999-q26z6: javax.xml.soap.SAAJMetaFactory = com.sun.xml.messaging.saaj.soap.SAAJMetaFactoryImpl
auth-668cdfb999-q26z6: javax.xml.ws.spi.Provider = com.sun.xml.ws.spi.ProviderImpl
auth-668cdfb999-q26z6: jdk.extensions.name = Extensions for OpenJDK for Eclipse OpenJ9
auth-668cdfb999-q26z6: jdk.extensions.version = 11.0.17.0
auth-668cdfb999-q26z6: line.separator = \n
auth-668cdfb999-q26z6: networkaddress.cache.negative.ttl = 0
auth-668cdfb999-q26z6: networkaddress.cache.ttl = 0
auth-668cdfb999-q26z6: os.arch = amd64
auth-668cdfb999-q26z6: os.encoding = ANSI_X3.4-1968
auth-668cdfb999-q26z6: os.name = Linux
auth-668cdfb999-q26z6: os.version = 5.4.0-146-generic
auth-668cdfb999-q26z6: path.separator = :
auth-668cdfb999-q26z6: sun.arch.data.model = 64
auth-668cdfb999-q26z6: sun.boot.library.path = /usr/lib/jvm/ibm-semeru-open-11-jre/lib/default
auth-668cdfb999-q26z6: /usr/lib/jvm/ibm-semeru-open-11-jre/lib
auth-668cdfb999-q26z6: sun.cpu.endian = little
auth-668cdfb999-q26z6: sun.io.unicode.encoding = UnicodeLittle
auth-668cdfb999-q26z6: sun.java.command = auth-8.9.0.jar
auth-668cdfb999-q26z6: sun.java.launcher = SUN_STANDARD
auth-668cdfb999-q26z6: sun.java.launcher.pid = 1
auth-668cdfb999-q26z6: sun.jnu.encoding = ANSI_X3.4-1968
auth-668cdfb999-q26z6: user.country = US
auth-668cdfb999-q26z6: user.dir = /
...
auth-668cdfb999-q26z6: Operating System Metrics:
auth-668cdfb999-q26z6: Provider: cgroupv1
auth-668cdfb999-q26z6: Effective CPU Count: 128
auth-668cdfb999-q26z6: CPU Period: 100000us
auth-668cdfb999-q26z6: CPU Quota: -1
auth-668cdfb999-q26z6: CPU Shares: 2us
auth-668cdfb999-q26z6: List of Processors, 128 total:
auth-668cdfb999-q26z6: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 1
15 116 117 118 119 120 121 122 123 124 125 126 127
auth-668cdfb999-q26z6: List of Effective Processors, 128 total:
auth-668cdfb999-q26z6: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 5
7 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 1
15 116 117 118 119 120 121 122 123 124 125 126 127
auth-668cdfb999-q26z6: List of Memory Nodes, 2 total:
auth-668cdfb999-q26z6: 0 1
auth-668cdfb999-q26z6: List of Available Memory Nodes, 2 total:
auth-668cdfb999-q26z6: 0 1
auth-668cdfb999-q26z6: Memory Limit: 16.00G
auth-668cdfb999-q26z6: Memory Soft Limit: Unlimited
auth-668cdfb999-q26z6: Memory & Swap Limit: 16.00G
auth-668cdfb999-q26z6: Maximum Processes Limit: 629145
auth-668cdfb999-q26z6:
...
auth-668cdfb999-q26z6: 2023-05-26 12:27:40,328 INFO [main] [MemoryMonitor] : Found GC Bean scavenge mem pools [nursery-allocate, nursery-survivor]
auth-668cdfb999-q26z6: 2023-05-26 12:27:40,329 INFO [main] [MemoryMonitor] : Found GC Bean global mem pools [tenured-SOA, tenured-LOA, nursery-allocate, nursery-survivor]
Any update on this? We also tried -Xnoloa
at the customer, but it didn't help.
There is not clear what is going on, why we have problem exposed somehow on one particular node. We were noticed there is no easy way to debug this kind of the problem if it is not reproducible in house. We are improving JVM tracepoint output to simplify data collection - PR should be opened soon.
@WPCode I am working on the PR, which would help identify the issued memory pool and wrong memory usage size, meanwhile you can try to add -XX:+HeapManagementMXBeanCompatibility for your run, this option would consolidate all of heap memory pools and report one single java heap memory usage, if the issue is related with heap memory pools, this option likely avoid the issue, but if it wouldn't help your case, the issue is likely in non heap memorys( such as classes, jit code cache, jit data cache...).
@LinHu2016 We tried -XX:+HeapManagementMXBeanCompatibility
today at the customer, but the issue is still there. We will try to grab the java core dumps next week.
@dmitripivkine, Hi Can i join you and contribute something
We were able to change the core_pattern to core at customer and see in the logs that the dump is written to a file in the container's persistent volume. However when we check the directory /home/xxx/data inside the volume, it's empty. There is no file called core.20230823.133812.1.0001.dmp . Any idea why? What can we try next?
auth-b966d6664-c4tpq: JVMDUMP039I Processing dump event "throw", detail "java/lang/IllegalArgumentException" at 2023/08/23 13:38:12 - please wait.
auth-b966d6664-c4tpq: JVMDUMP032I JVM requested System dump using '/home/xxx/data/core.20230823.133812.1.0001.dmp' in response to an event
auth-b966d6664-c4tpq: JVMDUMP010I System dump written to /home/xxx/data/core.20230823.133812.1.0001.dmp
the issue has been identified it is related JIT code cache, for the issued machine the JIT code cache has used 1GB large page, but the default maximum code cache size is 256MB, so 1GB (committed size) is larger than 256MB (max size), JIT Team is working on a fix, for temporary workaround, -Xlp:codecahce:pageszie=4K can be specified to avoid the issue.
Java -version output
openjdk version "11.0.17" 2022-10-18 IBM Semeru Runtime Open Edition 11.0.17.0 (build 11.0.17+8) Eclipse OpenJ9 VM 11.0.17.0 (build openj9-0.35.0, JRE 11 Linux amd64-64-Bit Compressed References 20221031_559 (JIT enabled, AOT enabled) OpenJ9 - e04a7f6c1 OMR - 85a21674f JCL - a94c231303 based on jdk-11.0.17+8)
Summary of problem
We are seeing an exception "committed value cannot be larger than the max value" when trying to deploy our product at a customer's kubernetes cluster running on baremetal vm. We use OpenJ9 0.35.0 as the JVM. This error doesn't shows up at any other customer's kubernetes cluster. We can not reproduce it. We found an article here https://www.ibm.com/support/pages/apar/IJ21959, but not sure whether it's related. Below is the log. Any suggestion on what to do next? Thanks in advance!
Diagnostic files
auth-b44ff78cc-x4b8t: Exception in thread "MemoryMXBean notification dispatcher" java.lang.IllegalArgumentException: committed value cannot be larger than the max value auth-b44ff78cc-x4b8t: at java.management/java.lang.management.MemoryUsage.(Unknown Source)
auth-b44ff78cc-x4b8t: at jdk.management/com.ibm.lang.management.internal.ExtendedGarbageCollectorMXBeanImpl.buildGcInfo(Unknown Source)
auth-b44ff78cc-x4b8t: at jdk.management/com.ibm.lang.management.internal.MemoryNotificationThread.dispatchGCNotificationHelper(Unknown Source)
auth-b44ff78cc-x4b8t: at jdk.management/com.ibm.lang.management.internal.MemoryNotificationThread.processNotificationLoop(Native Method)
auth-b44ff78cc-x4b8t: at jdk.management/com.ibm.lang.management.internal.MemoryNotificationThread.run(Unknown Source)
auth-b44ff78cc-x4b8t: at java.base/java.lang.Thread.run(Unknown Source)
auth-b44ff78cc-x4b8t: 2023-07-04 12:44:57,908 INFO [main] [BaseVmtComponent] : Component transition from NEW to STARTING for auth-b44ff78cc-x4b8t
auth-b44ff78cc-x4b8t: 2023-07-04 12:44:57,917 ERROR [main] [ContextLoader] : Context initialization failed
auth-b44ff78cc-x4b8t: java.lang.IllegalArgumentException: committed value cannot be larger than the max value
auth-b44ff78cc-x4b8t: at java.management/java.lang.management.MemoryUsage.(Unknown Source)
auth-b44ff78cc-x4b8t: at java.management/com.ibm.java.lang.management.internal.MemoryPoolMXBeanImpl.getUsageImpl(Native Method)
auth-b44ff78cc-x4b8t: at java.management/com.ibm.java.lang.management.internal.MemoryPoolMXBeanImpl.getUsage(Unknown Source)
auth-b44ff78cc-x4b8t: at io.prometheus.client.hotspot.MemoryPoolsExports.addMemoryPoolMetrics(MemoryPoolsExports.java:106)
auth-b44ff78cc-x4b8t: at io.prometheus.client.hotspot.MemoryPoolsExports.collect(MemoryPoolsExports.java:125)
auth-b44ff78cc-x4b8t: at io.prometheus.client.CollectorRegistry.collectorNames(CollectorRegistry.java:100)
auth-b44ff78cc-x4b8t: at io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:50)
auth-b44ff78cc-x4b8t: at io.prometheus.client.Collector.register(Collector.java:139)
auth-b44ff78cc-x4b8t: at io.prometheus.client.Collector.register(Collector.java:132)
auth-b44ff78cc-x4b8t: at io.prometheus.client.hotspot.DefaultExports.initialize(DefaultExports.java:24)