Zero downtime rolling updates on EKS

tpowellocto commented 4 months ago

Environment

PostgREST version: (if using docker, specify the image) postgrest/postgrest:v12.0.2
Operating system: AWS EKS v1.25.16-eks-3af4770

Description of issue

Minor service outage caused by cycling postgrest k8s pods. When the service is under constant load, taking a functional pod offline causes some requests to be dropped (result 502), plus a small number of requests to timeout.

(Expected behavior vs actual behavior) Connections to an active pod should be allowed to drain before application is stopped.

A popular method of resolving this seems to be by running a sleep command on the preStop lifecycle hook. This is not possible with the current container image as no shell utilities are packaged within the image (its built from SCRATCH).

(Steps to reproduce: Include a minimal SQL definition plus how you make the request to PostgREST and the response body)

stand up a k8s postgrest cluster.
generate constant request load against the service.
issue a kubectl rollout restart on the service.

wolfgangwalther commented 4 months ago

A popular method of resolving this seems to be by running a sleep command on the preStop lifecycle hook. This is not possible with the current container image as no shell utilities are packaged within the image (its built from SCRATCH).

You can always create your own docker image and just use something like COPY --from=postgrest/postgrest:xyz /bin/postgrest /bin to get the static executable into your derived image. You can then use all the tools you want.

tpowellocto commented 4 months ago

You can always create your own docker image

@wolfgangwalther This has been my solution to date. The described issue definitely makes the provided (official) image less useful though.

wolfgangwalther commented 4 months ago

I have not looked really looked into the issue itself, but: Is there any solution that we can provide via PostgREST natively without more tools inside the container?

If the only solution is to supply more tools in the container, then we're at "closing, won't fix, there's a workaround", I guess.

wolfgangwalther commented 4 months ago

There seems to be a proposal to make the sleeping in preStop a part of k8s itself: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3960-pod-lifecycle-sleep-action

Since we have a workaround now and there is ongoing effort to solve this upstream, I guess we can close this. If you disagree, feel free to re-open with a suggestion on what we could do instead.

PostgREST / postgrest

Zero downtime rolling updates on EKS #3633

Environment

Description of issue