Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 310 forks source link

High memory consumption with v1.25.2 #3443

Open smartaquarius10 opened 1 year ago

smartaquarius10 commented 1 year ago

Team,

Since the day I have updated the AKS to v1.25.2, I can see huge spikes and node memory pressure issues.

Pods are going in evicted state and nodes are always consuming 135 to 140% of memory.. Till the time I was at 1.24.9 everything was working fine.

Just now, I saw that portal.azure.com has removed the v1.25.2 version from Create new-->Azure kubernetes cluster section. Does this version of AKS has any problem. Should we immediately switch to v1.25.4 for resolving memory issue.

I have also observed that AKS 1.24.x version had ubuntu 18 but AKS 1.25.x version has ubuntu 22. Is this the reason behind high memory consumption.

Kindly suggest.

Regards, Tanul


My AKS Configuration:- 8 nodes of Standard B2s size as its a non-prod environment. Pod structure:- Below are the listed pods and their memory consumption except the default microsoft pods(which are taking 4705 Mi of memory in total) running inside cluster

  • Dameon set of AAD pod identity:- Taking total 191 Mi of memory
  • Total 2 pods of kong :- Taking total 914 Mi Memory
  • Daemon set of twistlock vulnerability scanner:- Taking total 1276 Mi of memory
  • Total 10 pods of our .net microservices:- Taking total 820 Mi of memory
anthonyAdhese commented 1 year ago

Also running into this issue I think, both for Java docker app as for a Debian docker. Anyone already tried with running 1.26 version to see if it got fixed?

javiermarasco commented 1 year ago

@anthonyAdhese the cgroupV2 is a feature that is enabled in any Kubernetes v1.25 and later, is your application using Java? maybe you need to upgrade the JRE, this is not something that will be fixed in Kubernetes as is not a problem unless the application uses an old framework not compatible with cgroupsv2.

anthonyAdhese commented 1 year ago

@anthonyAdhese the cgroupV2 is a feature that is enabled in any Kubernetes v1.25 and later, is your application using Java? maybe you need to upgrade the JRE, this is not something that will be fixed in Kubernetes as is not a problem unless the application uses an old framework not compatible with cgroupsv2.

Yeah, was just going through this (huge) thread and saw some suggestions about JRE, so might give that a shot, was just curious if it was a bug only in 1.25 or not. Thanks for the info.

Funny thing is though, we don't have this issue on 1.25.7 on GKE so was wondering if it was an 'Azure' exclusive bug or not.

Edit: just saw that Google is still using cgroupv1 on that version, might explain the difference.

Update for people from the internet, bumping JRE version did indeed fix the memory issues, was fixed with going to a newer JRE, we went from 11 to 17. In the end we did switch out this java component completely though but the fix was working in terms of memory.

nduytg commented 1 year ago

Encountered the same issue.

Some already pointed out that we can fix this by updating JVM parameters. So if your app is stuck with Java 8 (like mine). You can try some workarounds for this

codigoespagueti commented 1 year ago

The comments in this ticket have turned into a Java containers issue but it was created as a general problem affecting Azure core pods. In my case the main memory problem is with the pods used by Azure for generating Insights monitoring data. I had to disable it and still I have to restart the nodes every week.

Has Microsoft fixed anything regarding it?

adejongh commented 1 year ago

I also want to know if Microsoft has done anything about this - we are on an older AKS cluster with version 1.24.6, with small nodes, and cannot upgrade until this has been fixed.

smartaquarius10 commented 1 year ago

@codigoespagueti Correct.

Please do not submit java related details. This post is dedicatedly for ama agent pods which are consuming a lot of memory.

@ganga1980 @pfrcks Any updates on this.

pfrcks commented 1 year ago

@smartaquarius10 we have rolled our couple of changes which help reduce the memory footprint of ama-logs pods. Can you confirm what version of ama-logs are you on? kubectl get pod -n kube-system -o=jsonpath='{.spec.containers[0].image}'

smartaquarius10 commented 1 year ago

@pfrcks This one mcr.microsoft.com/azuremonitor/containerinsights/ciprod:3.1.8

adejongh commented 1 year ago

We are running on the same version of the container insights (ama-logs) - mcr.microsoft.com/azuremonitor/containerinsights/ciprod:3.1.8

@pfrcks - is there a changelog somewhere we can follow, to see what changes are being made? We are stuck on 1.24.6 at the moment, and cannot upgrade until this has been reliably fixed. The big issue with that is that the 1.24 version is going out of support soon.

mick-feller commented 1 year ago

so i actually found this page: https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/

and upgrading our JDK 1.8 to 372 seemed to have fixed it for us. maybe it helps someone else. screenshot from the article: image

pfrcks commented 1 year ago

@smartaquarius10 @adejongh yes that is the version. Do you see any improvement in ama-logs pod resource usage?

Additionally ama-logs pod resource consumption is a separate issue unrelated to the JDK issue being discussed above. ama-logs pod version is not dependent on aks version

adejongh commented 1 year ago

@pfrcks - I want to know if I can safely upgrade our cluster to the 1.25.x versions with the new ama-logs version?

pfrcks commented 1 year ago

@adejongh as I mentioned above ama-logs version is independent of AKS version. My response was specifically for ama-logs pod resource consumption which is different from the AKS version issues.

dinilimento commented 1 year ago

We experience the same problem on AKS with kubernetes 1.25.6 running. Note the problem occurs in both a microsoft sql-server container and prometheus. Seems not only related to JVM based applications. cgroup settings match the specified kubernetes resource limits. We use this image: kubernetes.azure.com/node-image-version=AKSUbuntu-2204gen2containerd-202305.24.0

smartaquarius10 commented 1 year ago

@pfrcks its taking approximately 250Mib per pod. If it is normal then we close the ticket.

chriscardillo commented 1 year ago

Hello. I think I am running into a similar issue on a small cluster (v1.25.6) with one Standard_B2s node, which has 4GB RAM and 2 vCPUs.

The total memory currently consumed by all of my pods (kubectl top pods --sum=true -A) is 570Mi.

The total memory currently consumed by my node (kubectl top no) is 2164Mi, and this is reading as 100% of available my memory.

Per this article, even if I should only expect ~66% of my provisioned RAM (in this case, around 2.5 GB), my pods are only consuming 570Mi, so what is consuming the other 1.5GB RAM here (out of the 2164Mi total)?

efirdman commented 1 year ago

Is there an ETA to address this issue? It's becoming very difficult to use AKS. 80% of the cluster computing resources used by support (log and others) pods and not the business APIs

Cpcrook commented 1 year ago

@pfrcks any update on ciprod resource consumption fixes? We're in a similar boat to @chriscardillo where our consumption on kubectl top no are all reporting +120% memory consumption. 2x A2s_v2 system pool nodes, 1 D2s_v2 worker node. System pool nodes are the worst with > 160% consumption.

This all appears to have happened around the v1.25.6 upgrade and is blowing up our alerts on node memory consumption constantly.

pfrcks commented 1 year ago

@smartaquarius10 yes this is the expected usage at present. we are continuously working on driving down our resource usage.

aritraghosh commented 1 year ago

Thanks everyone for your valuable feedback.. As this thread has become quite lengthy, I have decided to close the issue and propose the following solutions:

If none of the above suggestions work, please create a support case and provide all relevant details.

Thank you for your understanding

Cpcrook commented 1 year ago

So the solution is basically "pay us more monthly for VMs" after the upgrade to 1.25.x.

Got it.

adejongh commented 1 year ago

@aritraghosh - how on earth can you close this issue? I need to know that I can safely upgrade our < 1.25 cluster to >= 1.25, and and that it will still run. It is not really something I can downgrade once I find out it is not working anymore. Our clusters are oldish, and do not allow us to add nodepools with larger nodes - so if we upgrade and this does not work... then it won't help to log an issue.

adejongh commented 1 year ago

This is the memory usage in 1.24.6 - will it be the same in > 1.25?

image
pfrcks commented 1 year ago

@adejongh as mentioned above, ama-logs resource usage is not dependent on AKS version so yes it will be the same.

chriscardillo commented 1 year ago

Moved my question to a separate issue: https://github.com/Azure/AKS/issues/3715

I am running very, very little on a single-node cluster and my memory is maxed out.

efirdman commented 1 year ago

Closing the issue was a bad call. I do not think the main concern was the particular version of the AKS, but the system is using too many computing resources for the ama-logs, ama-logs-rs, and order support pods. This makes it impossible to use Standard_B2s nodes and use container insights.

codigoespagueti commented 1 year ago

As can be read on the first messages, the issues with the ama-log pods started when updating to aks 1.25. If nothing has been fixed on these Azure own pods, how can the issue be considered closed? Why another issue needs to be opened?

imarkvisser commented 1 year ago

Closing valid issues is a bit weird. Please reopen.

seguler commented 1 year ago

Alright, we've reopened.

There are multiple issues being discussed here. Let me clarify:

1) [Known issue] If your application or its runtime depends on cgroups, then it might have been impacted due to the cgroups change, and resulted in oomkills or higher memory usage. This is explained here: https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/aks-memory-saturation-after-upgrade

2) There are also users reporting that AMA-logs is the culprit for the memory usage increase. I did look at our telemetry and I don't see a meaningful increase in AMA-logs memory usage across the AKS fleet or in the clusters that upgraded to 1.25. It is true that AMA-logs was using about 400MB by default a few months ago but there have been recent changes which have reduced the usage to about 250-300 MB. There is more work in progress to optimize this further.

3) It is possible that there are some changes in how memory is being reported by cadvisor now because that data is coming from cgroups v2. We're still investigating but we don't think this actually impacts usage, but just changes reporting.

If you indeed see an increase in memory usage of AMA-logs specifically, please post the details here. We suspect this may be something to do with memory usage data returned by cgroups v2 as opposed to v1, but we're happy to investigate.

Aaron-ML commented 1 year ago

Just to note we hit this issue and actually had our service pods crashloop because of running out of memory after the upgrade.

Definitely worth noting this somewhere on upgrading, as it's can cascade and cause real issues.

seguler commented 1 year ago

Just to note we hit this issue and actually had our service pods crashloop because of running out of memory after the upgrade.

Definitely worth noting this somewhere on upgrading, as it's can cascade and cause real issues.

Are you talking about the first issue with Java and .NET? If so, where would you want us to document this best? I am guessing not everyone is aware of the tech doc we published, so open to feedback on where to put that.

Aaron-ML commented 1 year ago

To add to this, we didn't see this in our dev or stage environment we upgraded. Did some java tracing and noticed the difference between two clusters on the same version:

Both Clusters are on 1.25.6 and have matching kernel-version, os-images, and container-runtimes. Only difference is the one working is in the WestUS2 region, and the non working on is WestEurope2

A cluster we upgraded today:

NOTE: Picked up JDK_JAVA_OPTIONS: -Xlog:os+container=trace -XX:InitialRAMPercentage=50.0 -XX:MaxRAMPercentage=50.0 -XX:+UseG1GC
[0.000s][trace][os,container] OSContainer::init: Initializing Container Support
[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][trace][os,container] No relevant cgroup controllers mounted.
openjdk version "17.0.6" 2023-01-17
OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)
OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing)

A cluster we upgraded last week:

NOTE: Picked up JDK_JAVA_OPTIONS: -Xlog:os+container=trace -XX:InitialRAMPercentage=50.0 -XX:MaxRAMPercentage=50.0 -XX:+UseG1GC
[0.000s][trace][os,container] OSContainer::init: Initializing Container Support
[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.000s][debug][os,container] Detected cgroups v2 unified hierarchy
[0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max
[0.001s][trace][os,container] Raw value for CPU quota is: max
[0.001s][trace][os,container] CPU Quota is: -1
[0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max
[0.001s][trace][os,container] CPU Period is: 100000
[0.001s][trace][os,container] OSContainer::active_processor_count: 8
[0.001s][trace][os,container] total physical memory: 33665449984
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max
[0.001s][trace][os,container] Raw value for memory limit is: 536870912
[0.001s][trace][os,container] Memory Limit is: 536870912
[0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 8
[0.012s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 8
[0.015s][trace][os,container] Path to /memory.current is /sys/fs/cgroup//memory.current
[0.015s][trace][os,container] Memory Usage is: 226570240
openjdk version "17.0.6" 2023-01-17
OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)
OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing)
adejongh commented 1 year ago
  1. There are also users reporting that AMA-logs is the culprit for the memory usage increase. I did look at our telemetry and I don't see a meaningful increase in AMA-logs memory usage across the AKS fleet or in the clusters that upgraded to 1.25. It is true that AMA-logs was using about 400MB by default a few months ago but there have been recent changes which have reduced the usage to about 250-300 MB. There is more work in progress to optimize this further.

@seguler - does this memory usage for 'ama-logs' seem normal?

image

This is actually on AKS 1.24.6 - here is the image info:

image
seguler commented 1 year ago
  1. There are also users reporting that AMA-logs is the culprit for the memory usage increase. I did look at our telemetry and I don't see a meaningful increase in AMA-logs memory usage across the AKS fleet or in the clusters that upgraded to 1.25. It is true that AMA-logs was using about 400MB by default a few months ago but there have been recent changes which have reduced the usage to about 250-300 MB. There is more work in progress to optimize this further.

@seguler - does this memory usage for 'ama-logs' seem normal? image

This is actually on AKS 1.24.6 - here is the image info: image

Yes, it looks normal (based on telemetry I am looking at). This pod captures metrics and logs for Container Insights. You can see it requests 325Mi in memory and uses about that much.

Marchelune commented 1 year ago

I've read through https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/aks-memory-saturation-after-upgrade but I'm still struggling to understand what actions to take and how I can make sure that the cgroups API is the cause. On a side note, I've upgraded our (internal) cluster to 1.26, it seems the memory usage of each node is down around 8% (ish) but it is still above what 1.24 used to provide.

What I'd like to identify is where the memory overhead comes from. If I sum all the memory footprint from kubectl top pod --sum=true -A, I get 3869Mi for all pods. But then kubectl top node returns:

NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
aks-akspool1-44018104-vmss00000r 186m 9% 4607Mi 74%
aks-akspool1-44018104-vmss00000s 156m 8% 4472Mi 72%
aks-akspool1-44018104-vmss00000t 145m 7% 4426Mi 71%

Of course one can expect an overhead for each node, but I'm not K8s-skilled enough to identify when that overhead is unreasonable, one of my suspicion being that this overhead went up after the 1.25 upgrade.

To try confirm, I've tested the following:

So, before the upgrade, e.g. in 1.24.10, I got

NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
aks-agentpool-54712384-vmss000000 378m 19% 1128Mi 52%
aks-agentpool-54712384-vmss000001 118m 6% 1092Mi 50%
Top pods before the upgrade

NAMESPACE NAME CPU(cores) MEMORY(bytes) default azure-vote-back-7cd69cc96f-lbqmz 2m 14Mi default azure-vote-front-7c95676c68-b8f4d 2m 51Mi kube-system azure-ip-masq-agent-48p8k 1m 12Mi kube-system azure-ip-masq-agent-p5hkm 1m 13Mi kube-system cloud-node-manager-hh5r4 1m 14Mi kube-system cloud-node-manager-xkbq6 1m 14Mi kube-system coredns-589487654b-c94bh 2m 18Mi kube-system coredns-589487654b-qlsjq 2m 17Mi kube-system coredns-autoscaler-5866788c6c-gzs79 1m 7Mi kube-system csi-azuredisk-node-84ftc 2m 42Mi kube-system csi-azuredisk-node-rqd8c 2m 41Mi kube-system csi-azurefile-node-226g5 2m 40Mi kube-system csi-azurefile-node-pfqds 2m 39Mi kube-system konnectivity-agent-cdcdf754f-fxbtb 1m 11Mi kube-system konnectivity-agent-cdcdf754f-vzdfk 2m 12Mi kube-system kube-proxy-d8wqd 1m 20Mi kube-system kube-proxy-rdpwg 1m 18Mi kube-system metrics-server-564bfb87fd-dht67 3m 40Mi kube-system metrics-server-564bfb87fd-h78bf 3m 36Mi ________ ________ 20m 467Mi

And after the upgrade, e.g. in 1.25.6:

NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
aks-agentpool-54712384-vmss000000 117m 6% 1695Mi 78%
aks-agentpool-54712384-vmss000001 110m 5% 1227Mi 57%
Top pods after the upgrade

NAMESPACE NAME CPU(cores) MEMORY(bytes) default azure-vote-back-7cd69cc96f-56t2b 2m 7Mi default azure-vote-front-7c95676c68-2x8pc 1m 43Mi kube-system azure-ip-masq-agent-lfp5v 1m 13Mi kube-system azure-ip-masq-agent-zg2c7 1m 13Mi kube-system cloud-node-manager-2fnv2 1m 20Mi kube-system cloud-node-manager-rvxcj 1m 18Mi kube-system coredns-589487654b-p6sxv 2m 27Mi kube-system coredns-589487654b-smhj8 2m 16Mi kube-system coredns-autoscaler-5866788c6c-t8xnt 1m 3Mi kube-system csi-azuredisk-node-5ldsq 1m 47Mi kube-system csi-azuredisk-node-9d6gl 2m 51Mi kube-system csi-azurefile-node-rnr4z 1m 45Mi kube-system csi-azurefile-node-sjvkz 1m 49Mi kube-system konnectivity-agent-5b9f455564-f2b86 1m 13Mi kube-system konnectivity-agent-5b9f455564-g2b7h 2m 16Mi kube-system kube-proxy-b5jgk 1m 25Mi kube-system kube-proxy-c2bfn 1m 26Mi kube-system metrics-server-564bfb87fd-tm8c4 3m 26Mi kube-system metrics-server-564bfb87fd-v2l5x 3m 24Mi ________ ________ 17m 493Mi

Given the nature of B2s VMs, it's finicky to rely on an instant memory metric, so any pointer to better analyse is welcome! I've also looked at the memory working set from the portal, and I can see the increased memory usage:

Screenshot 2023-06-15 at 19 29 13

I expect that B2s are particularly affected by that memory issue, since they "only" offer 4GiB of ram, thus I guess it's easier to reach OOM limits. Our production workloads don't run on them, but B2s are quite valuable to run internal clusters with very low usage at a reasonable cost.

Marchelune commented 1 year ago

To build on my previous comment, I did another test, where I created a "bare" 1.24.10 cluster, and after a while I upgraded it to 1.25 then 1.26. I didn't deploy anything to it, I just looked at the reported memory working set on the portal:

Screenshot 2023-06-16 at 15 10 54

Edit: this is running with 3 B2s VMs, and I am running clusters in the West Europe region

jjader11 commented 1 year ago

Thank you all for your contributions, I leave you my solutions.

I will divide it into 2 parts:

First part:

Since we updated the development environment to version 1.25.3 we saw the memory increase problem and found that JAVA developments were the most affected. We put a support ticket to Microsoft and their response was: "We don't understand what is happening, in our environments everything worked very well" Hahaha what a funny response.

Finally, after hours of searching we found the issue of CgruopV2, so we decided to return the machines to Cgroup V1 with the following procedure:

vi /etc/default/grub

Update these lines:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash systemd.unified_cgroup_hierarchy=0"
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"

Update the Grub of the OS update-grub

Restart.

NOTE: Remember that by doing this, Microsoft will immediately tell you that they cannot support the OS because we modified it.

It is important to make it clear that the best solution is to update your projects to a version of JAVA, NET and Nodejs supported by CgroupV2. In our case, we have more than a thousand microservices and it is super complex to orchestrate everything with the development team.

Second part.

When we updated to version 1.25.5 we already had the workaround for this problem, but we realized that in this version the first solution was not 100% effective.

The nodes gradually increased their memory usage until they died.

Another week with this problem. And we found that the containerd defaults to using cgroupV2, so we patched it so it didn't use it.

the procedure is this:

edit the /etc/containerd/config.yaml file

Change the parameter:

SystemdCgroup = true to SystemdCgroup = false

Restart Containerd and Kubelet.

This workaround helped us, now our nodes are working normally.

I hope these workarounds will help you, I reiterate the best solution and in order not to run the risk of losing support with Microsoft, it is better to update Java, Net and Nodejs to a version supported by cgroup v2.

greetings.

sandeeplamb commented 1 year ago

Seems we have same issue. We can not try above solution as mentioned by @jjader11 We have non-prod clusters shutting down every day to save the cost. So every restart will remove the settings. And to update the images to latest versions, might take some time. We have ticket open for 5 days with Microsoft and its a big mess-up from AKS side.

alexeldeib commented 1 year ago

https://github.com/Azure/AKS/issues/3715#issuecomment-1610339833

you could potentially confirm this is the issue by checking /proc/meminfo and comparing with my math in that comment - would definitely help validate/reject the theory.

likely would need a cadvisor/libcontainer fix, if correct.

smartaquarius10 commented 1 year ago

Trust me after trying every path I finally created a new cluster with heavy machines of E series. Memory usage is still a problem but what to do. If Cloud service providers would have added some validation like whosoever is using light machines like b series should not be allowed to update to 1.25.x series. Through this lot of time and effort could have saved.

I don’t think that this could be the difficult task for The cloud providers to figure out which customer is using what sku of nodes in AKS.

Its been 1.5 month I am working on migration which is an unnecessary effort just because of this issue.

motizukilucas commented 1 year ago

Trust me after trying every path I finally created a new cluster with heavy machines of E series. Memory usage is still a problem but what to do. If Cloud service providers would have added some validation like whosoever is using light machines like b series should not be allowed to update to 1.25.x series. Through this lot of time and effort could have saved.

I don’t think that this could be the difficult task for The cloud providers to fetch that which customer is using what sku of nodes in AKS.

Its been 1.5 month I am working on migration which is an unnecessary effort just because of this issue.

In my scenario I have two node pools, one of which has heavy machines for the apps and other one for support, which is B series

And it is exactly the B series which keeps going off the roof with memory usage, which in theory isn't running anything other than default AKS resources

smartaquarius10 commented 1 year ago

@motizukilucas trust me if you want immediate solution.. Add one new node pool upgrade machines a little and transfer apps from B series to new node pool and delete that node pool. But, make sure to do that when no one is using apps in that cluster as SNAT exhaustion can happen unless using NAT gateway

seguler commented 1 year ago

A fix is being discussed here: https://github.com/kubernetes/kubernetes/issues/118916

We don't expect this to be resolved soon in the K8s 1.25 version. Users who upgrade to 1.25 will observe higher reported memory utilization (400-500MB in our tests on idle nodes). This shouldn't impact most users as this is an increase in what's reported. However, if your nodes were close to MemoryPressure, then you may likely observe pod evictions related to it because of the accounting problem. If you do, we recommend you increase requested memory and scale up your nodepools accordingly until a fix is delivered.

smartaquarius10 commented 1 year ago

@seguler scale up of node is not possible for everyone because it is highly dependent on subnet space.

But I request microsoft to add some validation to disable the usage of b2s and d2s machines(i.e. 4 to 7 gb ram) if user is selecting the version >=1.25.x especially if the subnet cidr is /24.

As I am facing challenge with D2s in production with this cidr. Just a suggestion. Thank you.

smartaquarius10 commented 1 year ago

Can anyone confirm if 1.24.x version of AKS is still on cgroups 1 or it is also having that cgroups v2.. Would be grateful if someone confirm please as early as possible otherwise our production cluster will be impacted.

jjader11 commented 1 year ago

Can anyone confirm if 1.24.x version of AKS is still on cgroups 1 or it is also having that cgroups v2.. Would be grateful if someone confirm please as early as possible otherwise our production cluster will be impacted.

Hi man,

No, Cgroup V2 is in Ubuntu 22.04, and I remember that Kubernetes 1.24.x is working on ubuntu 20.04.

smartaquarius10 commented 1 year ago

Thanks

harshagarwalsol commented 1 year ago

@jjader11 : Found an example supported by microsoft itself : https://github.com/Azure/AKS/tree/master/examples/cgroups

Important notes: Please apply this to one nodepool at a time (via node selectors or node affinity rules) as it will reboot all nodes at a time if you don't specify how to roll it out.

Somebody said something about "SNAT exhaustion" when creating a new nodepool, I dont know if this is something that we need to watch out for.. I dont want to spend time on understanding SNAT exhaustion so will apply the daeomenset one node at a time. Using kubectl taint, kubectl label and restarting deployments to make sure nothing is running on the node that i plan to deploy the code to.

jr01 commented 1 year ago

An alternative to forcing cgroups v1 via DaemonSet, is to create an agentpool with osSKU set to AzureLinux or CBLMariner.

CBL Mariner v2 and v1 has systemd.unified_cgroup_hierarchy=0 in /boot/systemd.cfg . See also the change comment here: https://github.com/microsoft/CBL-Mariner/blob/2.0/SPECS/systemd/systemd-bootstrap.spec#L292