GoogleCloudPlatform / cloud-sql-proxy

A utility for connecting securely to your Cloud SQL instances
Apache License 2.0
1.28k stars 349 forks source link

Readiness check supports configuring minimum number of ready instances #1375

Closed rugamaga closed 2 years ago

rugamaga commented 2 years ago
## Feature Description - Allow specify a number of least necessary connected cloud-sql replica for unhealthy condition. ## Alternatives Considered - Create new cloud-sql-proxy for each cloud-sql (read) replica and make special script for checking ready cloud-sql-proxies. - it need to up a lot of other cloud-sql-proxy instances. - it mess up readiness probe definitions. ## Additional Context - If you are using multiple cloud-sql replicas, you can use `query retry` strategies in your application code. - 1. try to query for 1st instance in the (randomized) list. - 2. if it doesn't work, retry the query in 2nd instance in the list. (continue to last of the list) - 3. when correct response has come, you can use the values. - Such query strategy is making our application robust for downing some replicas. (it will avoid a read replica will be a single point of failure) - But, current cloud-sql-proxy's readiness check will be unhealthy if single node was down. - In Kubernetes, this behavior results the Pod's down if single cloud-sql replica has down the pod cannot handle all requests regardless our application can response it (by using another replica.) - This means, the retry strategy is not working with cloud-sql-proxy and a single read replica will be a single point of failure. - If there is this feature, we can use this for avoiding such system down.
enocom commented 2 years ago

Thanks for the feature request, @rugamaga. This is something we considered when writing the initial readiness check. So to briefly summarize, are you suggesting the Cloud SQL Proxy support a configurable readiness check? In other words, maybe we allow callers to specify how many instances can fail before the readiness check fails?

rugamaga commented 2 years ago

@enocom Yes. If you can specify how many instances can fail before the readiness check fails, it would be useful for making it robust.

enocom commented 2 years ago

How about something like:

—readiness-check-threshold=n

where n is the number of instances which can fail before the check fails?

rugamaga commented 2 years ago

That's nice. but in this context, I think we want to check minimum alived instances. so better option is --min-ready-instances=n where n is ready instances. for example, when n = 2, cloud-sql-proxy will respond ready if over than two instances are alived.

enocom commented 2 years ago

OK, got it. I think that's a pretty simple fix. I'll pull this into my queue.

enocom commented 2 years ago

I decided to add support for a min-ready query param instead of adding another CLI flag. See #1496 for details.