We have a WAF that has been deployed to dev for quite a while, but as we move it to sandbox, we noticed that it causes the envoy admin server to start becoming unresponsive. We use our admin endpoint to serve /ready and /server_info for liveness and readiness healthchecks, and when this happens, Kubernetes starts killing the healthy pods.
My hunch is that we have so many filter chains that need the wasm filter that it takes a little bit to initialize, and during this duration, especially on a busy server, the admin endpoint can become unavailable.
We have a WAF that has been deployed to dev for quite a while, but as we move it to sandbox, we noticed that it causes the envoy admin server to start becoming unresponsive. We use our admin endpoint to serve
/ready
and/server_info
for liveness and readiness healthchecks, and when this happens, Kubernetes starts killing the healthy pods.The container doesn't seem to be resource constrained when this is happening, but I wonder if it's something like https://github.com/envoyproxy/envoy/issues/16425.
We are still in the midst of debugging, but posting here in case someone has ran into this before.