Open shreyas-s-rao opened 1 year ago
but requires a permanent fix from VPA if we are to re-enable /scale subresource for Etcd in gardener.
Is it planned to open another issue in the official kubernetes/autoscaler
repository and get feedback from the community or is there another vision to fix this permanently?
VPA presumably could deal with the /scale subresource, as long as we point it (VPA) to the Etcd resource (instead of the stateful set, as it now is). It is designed to work with custom replication controllers. I believe that HVPA is currently limited to a list of well-known replication controllers though. So, as long as we're using HVPA to scale etcd, we'd need to enhance HVPA if we are to make use of the Etcd/scale subresource.
@voelzmo would you like to check this?
I'm not sure about this issue. My understanding is that the VPA contract says "I can provide recommendations not only for well-known Objects like Deployment and StatefulSet, but for everything that has a /scale
subresource. The VPA doesn't call /scale
or does anything with it, it just gets the label selector to make sure it knows which Pods are associated with the VPA object.
So I don't think there's much to do for the VPA upstream, this works as designed. We could adjust the targetRef
for the etcd VPA to point to the parent Etcd
resource instead of the StatefulSet. As we're not using HVPA to do any horizontal scaling for etcd, so I guess this could "just work"?
What happened: VPA does not provide recommendations for resources whose parent resource defines the
/scale
subresource. Case in point: the etcd statefulset in gardener is targeted by the VPA in order to provide scaling recommendations for the etcd pod(s). The etcd statefulset is controlled by etcd-druid, which generally did not specify theownerRef
on it. With g/g v1.53.0, etcd-druid started settingownerRef
on the etcd statefulset, which pointed to the parentEtcd
resource. TheEtcd
CRD defines the/scale
subresource, for the purpose of scaling the statefulset via theEtcd
resource. Due to this change, VPA stopped providing recommendations to the etcd statefulset. The issue was identified by @istvanballok, who also verified that the issue was resolved when either theownerRef
was removed, or the/scale
subresource was removed from theEtcd
CRD.What you expected to happen: VPA should have provided recommendations. The rationale behind the decision by VPA to not provide recommendations during such a situation is probably to avoid two controllers modifying the same field
spec.resources
. But since we use the VPA inupdateMode: Off
, the recommendations should have been provided anyway.How to reproduce it (as minimally and precisely as possible): Run gardener
v1.53.1
. You will observe that VPA does not provide new recommendations foretcd-main
andetcd-events
VPA objects in the control plane of the shoot cluster./cc @andrerun @istvanballok @unmarshall @timuthy
Temporarily fixed in https://github.com/gardener/gardener/pull/6850 by removing
/scale
subresource from theEtcd
CRD, but requires a permanent fix from VPA if we are to re-enable/scale
subresource forEtcd
in gardener.