emissary-ingress / emissary

open source Kubernetes-native API gateway for microservices built on the Envoy Proxy
https://www.getambassador.io
Apache License 2.0
4.32k stars 685 forks source link

Readiness and Liveness Probe Failing with Overload Manager Configuration #5589

Open sekar-saravanan opened 4 months ago

sekar-saravanan commented 4 months ago

Hi, I am working on emissary-ingress for quite sometime and configured the setup in kubernetes with below probs.

livenessProbe:
    httpGet:
        path: /ambassador/v0/check_alive
        port: 8877
        scheme: HTTP
readinessProbe:
    httpGet:
        path: /ambassador/v0/check_ready
        port: 8877
        scheme: HTTP

I'm in need of using envoy overload manager feature (envoy.overload_actions.stop_accepting_requests) to block some percent of request when cpu usage is too high and I made some modification on envoy.json to use the same.

  "overload_manager": {
    "actions": [
      {
        "name": "envoy.overload_actions.stop_accepting_requests",
        "triggers": [
          {
            "name": "envoy.resource_monitors.injected_resource",
            "scaled": {
              "saturation_threshold": 1,
              "scaling_threshold": 0.01
            }
          }
        ]
      }
    ],
    "refresh_interval": "3s",
    "resource_monitors": [
      {
        "name": "envoy.resource_monitors.injected_resource",
        "typed_config": {
          "@type": "type.googleapis.com/envoy.extensions.resource_monitors.injected_resource.v3.InjectedResourceConfig",
          "filename": "/tmp/pressure"
        }
      }
    ]
  }

when overloadactions triggered, its started to block some percent of requests. But unexpectedly, readiness and liveness probes (**/ambassador/v0/checkready, _/ambassador/v0/checkalive**) also getting failed.

Is there any other way, we can perform healthcheck on emissary-ingress instead (_/ambassador/v0/checkready, _/ambassador/v0/checkalive) ?

cindymullins-dw commented 2 months ago

Those are the only health-check endpoints I know of. I'm not sure there's another way to access the readiness/liveness probes if blocked by a custom config. Will mark this as a feature request. You could look at Active Health Checking w/ the Endpoint Resolver.