SovereignCloudStack / cluster-stacks

Definition of Cluster Stacks based on the ClusterAPI ClusterClass feature
https://scs.community/
Apache License 2.0
7 stars 6 forks source link

Leaking kubeadmconfigtemplates, openstackmachinetemplates ... #105

Open garloff opened 1 month ago

garloff commented 1 month ago

/kind bug

What steps did you take and what happened: A management cluster (kind) running in an SCS-2V-4 VM for 3 months (mostly idle) became unusable. After some debugging, it was found that the kube-apiserver's memory usage had exploded to > 2GiB RSS. This caused the machine to aggressively discard memory (kswapd0) just to hit major page faults resulting in the memory to be paged back in. System load > 50 (on a 2vCPU server), >>10k major page faults/s and >500MB/s reading from disk.

What did you expect to happen: 4GiB should be sufficient RAM for a not too busy management host.

Anything else you would like to add: I was assuming that the CSO/CSPO are causing the kube-apiserver memory usage by storing too many objects. I thus far found kubeadmconfigtemplates and clusterclasses to exist in excessive numbers.

Environment:

garloff commented 1 month ago

13683 kubeadmconfigtemplates:

cluster2    capi-openstack-alpha-1-28                      93d
cluster4    capi-openstack-alpha-1-28                      93d
cluster4    cs-cluster4-capi-openstack-alpha-1-28-ljnkh    93d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-222ck   55d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-224lk   14d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-225pj   68d
[...]
garloff commented 1 month ago

15646 openstackmachinetemplates:

cluster2    capi-openstack-alpha-1-28                      94d
cluster2    capi-openstack-alpha-1-28-control-plane        94d
cluster4    capi-openstack-alpha-1-28                      93d
cluster4    capi-openstack-alpha-1-28-control-plane        93d
cluster4    cs-cluster4-capi-openstack-alpha-1-28-mmjrw    93d
cluster4    cs-cluster4-xlh9r                              93d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-226gt   87d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-226qq   76m
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-2275d   92d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-228jx   88d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-229r2   89d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-229vc   79d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-22b2t   30d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-22bm4   73d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-22cl4   84d
[...]
garloff commented 1 month ago

kubectl delete -n cluster4 kubeadmtemplate <LIST OF 13000 names> takes more than an hour, but seems to help memory usage. Same for openstackmachinetemplate. I also did compacting and defragmenting on etcd to recover.