GoogleCloudPlatform / cloud-sql-proxy

A utility for connecting securely to your Cloud SQL instances
Apache License 2.0
1.26k stars 346 forks source link

`examples/k8s-health-check` folder README advises against readiness probes for (to me) incomprehensible reasons #2292

Open sh-at-cs opened 3 weeks ago

sh-at-cs commented 3 weeks ago

Description

The examples/k8s-health-check folder's README.md says:

For most common usage, adding a readiness healthcheck to the proxy sidecar container is unnecessary. An improperly configured readiness check can degrade the application's availability. [...] Most applications are resilient to transient database connection failures, and do not need to be restarted. [...] You should use the proxy container's readiness probe when these circumstances should cause k8s to terminate the entire pod: [...]

That makes it sound as if a failing readiness probe would cause the Pod to be restarted and we shouldn't use them for the cloud-sql-proxy sidecar for that reason, but this doesn't seem to be the case (?)

Instead, if I understood k8s's docs correctly, a failing readiness probe will not cause a Pod to be restarted, and readiness probes are only important for two things:

  1. Directing traffic to a Pod - only ready Pods will receive traffic.
  2. Determining whether a Pod in a Deployment is considered "available" in the context of the RollingUpdate strategy.

And for (2), it seems to me like having an accurate readiness probe is in fact essential, because otherwise the rolling update will consider pods "available" that are really broken, which then causes k8s to stop healthy but outdated Pods to make room for them.

So I don't understand this piece of advice.

Does it just confuse the readiness probe with the liveness probe (maybe because that was introduced later? But then again, the document does mention the liveness probe, too...)?

Potential Solution

Fix the README so it accurately reflects what a readiness probe is for and how it's relevant, or, if it turns out I'm wrong, link to the paragraphs in the k8s docs where my misconceptions are corrected from README.md so other users don't develop the same ones in the future.

Additional Details

There are similar comments in the examples/k8s-health-check/proxy_with_http_health_check.yaml file, which should be corrected as well if it turns out that there is something to correct here.

jackwotherspoon commented 2 weeks ago

Hi @sh-at-cs thanks for raising an issue on the Cloud SQL Proxy!

I'll let @hessjcg respond to this as he is our kubernetes expert and should be able to provide guidance here 😄

hessjcg commented 2 weeks ago

@sh-at-cs, You raise a valid point. The k8s readiness probe controls how traffic is routed, not whether the pod is restarted. I will update the README to make this clear.

This advice is based on our practical experience. Our users generally get better availability when they configure readiness probes only on the main application container. Having a readiness probe on the main container then makes redundant the readiness probe on the to the proxy sidecar.

We should also update the examples to demonstrate how to make the proxy container start up to a ready state before the main application container starts. (The implementation is discussed in more detail in https://github.com/GoogleCloudPlatform/cloud-sql-proxy-operator/issues/496 and https://github.com/GoogleCloudPlatform/cloud-sql-proxy-operator/issues/381)