gardener / etcd-druid

An etcd operator to configure, provision, reconcile and monitor etcd clusters.
Apache License 2.0
72 stars 49 forks source link

VPA does not provide recommendations for resources with a parent resource that defines `/scale` subresource #560

Open shreyas-s-rao opened 1 year ago

shreyas-s-rao commented 1 year ago

What happened: VPA does not provide recommendations for resources whose parent resource defines the /scale subresource. Case in point: the etcd statefulset in gardener is targeted by the VPA in order to provide scaling recommendations for the etcd pod(s). The etcd statefulset is controlled by etcd-druid, which generally did not specify the ownerRef on it. With g/g v1.53.0, etcd-druid started setting ownerRef on the etcd statefulset, which pointed to the parent Etcd resource. The Etcd CRD defines the /scale subresource, for the purpose of scaling the statefulset via the Etcd resource. Due to this change, VPA stopped providing recommendations to the etcd statefulset. The issue was identified by @istvanballok, who also verified that the issue was resolved when either the ownerRef was removed, or the /scale subresource was removed from the Etcd CRD.

What you expected to happen: VPA should have provided recommendations. The rationale behind the decision by VPA to not provide recommendations during such a situation is probably to avoid two controllers modifying the same field spec.resources. But since we use the VPA in updateMode: Off, the recommendations should have been provided anyway.

How to reproduce it (as minimally and precisely as possible): Run gardener v1.53.1. You will observe that VPA does not provide new recommendations for etcd-main and etcd-events VPA objects in the control plane of the shoot cluster.

/cc @andrerun @istvanballok @unmarshall @timuthy

Temporarily fixed in https://github.com/gardener/gardener/pull/6850 by removing /scale subresource from the Etcd CRD, but requires a permanent fix from VPA if we are to re-enable /scale subresource for Etcd in gardener.

timuthy commented 1 year ago

but requires a permanent fix from VPA if we are to re-enable /scale subresource for Etcd in gardener.

Is it planned to open another issue in the official kubernetes/autoscaler repository and get feedback from the community or is there another vision to fix this permanently?

andrerun commented 1 year ago

VPA presumably could deal with the /scale subresource, as long as we point it (VPA) to the Etcd resource (instead of the stateful set, as it now is). It is designed to work with custom replication controllers. I believe that HVPA is currently limited to a list of well-known replication controllers though. So, as long as we're using HVPA to scale etcd, we'd need to enhance HVPA if we are to make use of the Etcd/scale subresource.

himanshu-kun commented 1 year ago

@voelzmo would you like to check this?

voelzmo commented 1 year ago

I'm not sure about this issue. My understanding is that the VPA contract says "I can provide recommendations not only for well-known Objects like Deployment and StatefulSet, but for everything that has a /scale subresource. The VPA doesn't call /scale or does anything with it, it just gets the label selector to make sure it knows which Pods are associated with the VPA object. So I don't think there's much to do for the VPA upstream, this works as designed. We could adjust the targetRef for the etcd VPA to point to the parent Etcd resource instead of the StatefulSet. As we're not using HVPA to do any horizontal scaling for etcd, so I guess this could "just work"?