corazawaf / coraza-spoa

EXPERIMENTAL: A wrapper around the OWASP Coraza WAF for HAProxy's SPOE filters
Apache License 2.0
82 stars 16 forks source link

TERM signal handling configuration #38

Open mac-chaffee opened 1 year ago

mac-chaffee commented 1 year ago

When running coraza-spoa in an environment like Kubernetes, handling SIGTERM becomes important for zero-downtime upgrades. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination

There are two main ways to deploy coraza-spoa: As a standalone pod, or as a sidecar in an haproxy pod.

Standalone coraza-spoa

When coraza-spoa is running standalone, it should respond to SIGTERM by processing existing messages but refusing new ones. In Kubernetes, since there can be slight delays between when a container enters the TERMINATING state and when traffic starts being sent to other replicas, it's also good to allow for a configurable delay before new messages start being denied (see --shutdown-grace-period here).

Sidecar coraza-spoa

When running coraza-spoa as a sidecar inside an existing haproxy pod, if you try to roll out a new version of that haproxy+coraza-spoa Pod, the old haproxy+coraza-spoa containers will be sent a SIGTERM signal simultaneously. Haproxy will typically begin draining existing connections, but those connections still need to be serviced by Coraza. In this situation, coraza-spoa should probably be configured to ignore SIGTERM entirely, or set a very long --shutdown-grace-period. That way Coraza still processes all of those remaining requests until the bitter end (SIGKILL). But you also don't want to delay shutdown of the pod for longer than necessary, so maybe there's something in SPOP where we can detect when there are no more connected haproxy instances? At that point it would be safe to shut down.

Somewhat related to #19 which might require some signal handling as well.

Tristan971 commented 1 year ago

Fwiw while a configurable grace period is still preferrable, you can always emulate it at the k8s level using a pre-stop lifecycle on your pod that does something like sleep N.

The pod is immediately put in unready state when the pre-stop begins (like the TERM by default) so removed from the service endpoints, but the pid 1 TERM itself is delayed until the end of the prestop (N seconds in this case).

Then you can truly tweak your grace based on how long your requests can last no matter whether the underlying process has graceful shutdown support