Leaking kubeadmconfigtemplates, openstackmachinetemplates ...

garloff commented 1 month ago

/kind bug

What steps did you take and what happened: A management cluster (kind) running in an SCS-2V-4 VM for 3 months (mostly idle) became unusable. After some debugging, it was found that the kube-apiserver's memory usage had exploded to > 2GiB RSS. This caused the machine to aggressively discard memory (kswapd0) just to hit major page faults resulting in the memory to be paged back in. System load > 50 (on a 2vCPU server), >>10k major page faults/s and >500MB/s reading from disk.

What did you expect to happen: 4GiB should be sufficient RAM for a not too busy management host.

Anything else you would like to add: I was assuming that the CSO/CSPO are causing the kube-apiserver memory usage by storing too many objects. I thus far found kubeadmconfigtemplates and clusterclasses to exist in excessive numbers.

Environment:

kind v0.20.0 go1.20.4 linux/amd64
Ubuntu 22.04 VM on an SCS-2V-4 flavor (2vCPU, 4GiB RAM, x86-64)
CSO/CSPO as of 93d ago (let me know how I can report this better)

garloff commented 1 month ago

13683 kubeadmconfigtemplates:

cluster2    capi-openstack-alpha-1-28                      93d
cluster4    capi-openstack-alpha-1-28                      93d
cluster4    cs-cluster4-capi-openstack-alpha-1-28-ljnkh    93d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-222ck   55d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-224lk   14d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-225pj   68d
[...]

garloff commented 1 month ago

15646 openstackmachinetemplates:

cluster2    capi-openstack-alpha-1-28                      94d
cluster2    capi-openstack-alpha-1-28-control-plane        94d
cluster4    capi-openstack-alpha-1-28                      93d
cluster4    capi-openstack-alpha-1-28-control-plane        93d
cluster4    cs-cluster4-capi-openstack-alpha-1-28-mmjrw    93d
cluster4    cs-cluster4-xlh9r                              93d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-226gt   87d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-226qq   76m
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-2275d   92d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-228jx   88d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-229r2   89d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-229vc   79d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-22b2t   30d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-22bm4   73d
cluster4    cs-cluster4a-capi-openstack-alpha-1-28-22cl4   84d
[...]

garloff commented 1 month ago

kubectl delete -n cluster4 kubeadmtemplate <LIST OF 13000 names> takes more than an hour, but seems to help memory usage. Same for openstackmachinetemplate. I also did compacting and defragmenting on etcd to recover.

SovereignCloudStack / cluster-stacks

Leaking kubeadmconfigtemplates, openstackmachinetemplates ... #105