Open sh-at-cs opened 3 weeks ago
Hi @sh-at-cs thanks for raising an issue on the Cloud SQL Proxy!
I'll let @hessjcg respond to this as he is our kubernetes expert and should be able to provide guidance here 😄
@sh-at-cs, You raise a valid point. The k8s readiness probe controls how traffic is routed, not whether the pod is restarted. I will update the README to make this clear.
This advice is based on our practical experience. Our users generally get better availability when they configure readiness probes only on the main application container. Having a readiness probe on the main container then makes redundant the readiness probe on the to the proxy sidecar.
We should also update the examples to demonstrate how to make the proxy container start up to a ready state before the main application container starts. (The implementation is discussed in more detail in https://github.com/GoogleCloudPlatform/cloud-sql-proxy-operator/issues/496 and https://github.com/GoogleCloudPlatform/cloud-sql-proxy-operator/issues/381)
Description
The
examples/k8s-health-check
folder'sREADME.md
says:That makes it sound as if a failing readiness probe would cause the Pod to be restarted and we shouldn't use them for the cloud-sql-proxy sidecar for that reason, but this doesn't seem to be the case (?)
Instead, if I understood k8s's docs correctly, a failing readiness probe will not cause a Pod to be restarted, and readiness probes are only important for two things:
RollingUpdate
strategy.And for (2), it seems to me like having an accurate readiness probe is in fact essential, because otherwise the rolling update will consider pods "available" that are really broken, which then causes k8s to stop healthy but outdated Pods to make room for them.
So I don't understand this piece of advice.
Does it just confuse the readiness probe with the liveness probe (maybe because that was introduced later? But then again, the document does mention the liveness probe, too...)?
Potential Solution
Fix the README so it accurately reflects what a readiness probe is for and how it's relevant, or, if it turns out I'm wrong, link to the paragraphs in the k8s docs where my misconceptions are corrected from
README.md
so other users don't develop the same ones in the future.Additional Details
There are similar comments in the
examples/k8s-health-check/proxy_with_http_health_check.yaml
file, which should be corrected as well if it turns out that there is something to correct here.