Open jdekonin opened 1 year ago
cgroupv2 was questioned as to be setup correctly within this environment so data collection was requested. In a working cgroupv2 environment, using unmodified websphere-liberty:22.0.0.13-kernel-java11-openj9
, cgroupv2 is recognized correctly but running in a container is not.
Options are needed to work even within a working cgroupv2 environment that is running with limited resources.
-Xcodecachetotal
-Xjit:scratchSpaceLimit
-XcompilationThreads
and Xgcthreads
to reduce thread overhead-Xshareclasses:none
-Xnocompressedrefs
or -Xmcrs
-Xms
and -Xmx
because defaults don't appear to take any of the above into consideration.Found 2 related issues on OpenJ9 https://github.com/eclipse-openj9/openj9/issues/137 https://github.com/eclipse-openj9/openj9/issues/4707
In the course of investigating a customer problem of OOMKiller terminating a pod, it was noticed that running in a container and cgroup v2 limits are not being detected as expected. The container is being launched from an AKS 1.25.5 environment, with memory constraints of 512M using a Liberty container
websphere-liberty:22.0.0.13-kernel-java11-openj9
version 22.0.0.9 which contains Semeru OE 11.0.17. I believe this maps to OMR sha 90a1bade which was Openj9 tag/releaseopenj9-0.35.0
From a javacore
Further debugging in the failing environment that
/sys/fs/cgroup/memory.max
shows the expected value, andstat -c %T -f /sys/fs/cgroup
returns the expected value ofcgroup2fs
, but when the JVM starts it has a max heap of 30+GB. Files/.dockerenv
and/run/.containerenv
do not exist in the environment, so container recognition in this use case is not enough. Using the option-XX:+/-UseContainerSupport
would seem appropriate, but that doesn’t appear to have any impact. It would appear from a web search [1] that AKS supports three different container types: docker, CRI-O, and containerd, with containerd the default since 1.19 and that a possible workaround [2] is creating aTESTCONTAINERS_HOST_OVERRIDE
environment variable which has not been confirmed as working.An attempt to recreate the problem with the same container “failed” as cgroup v2 settings were detected properly but container recognition did still fail. So there are some questions of how and what conditions is required for cgroup v2 detection.
Looking for a solution