Better signal for Envoy readiness in Kubernetes when using xDS

nulltrope commented 1 year ago

Title: Better signal for Envoy readiness in Kubernetes when using xDS

Description:

When running Envoy as a sidecar proxy in Kubernetes, you often want to wait to mark the container as "ready" until Envoy has received a fully hydrated config via xDS. Usually, this means waiting for one of more dynamic resources to be present in the Admin interface's /config_dump response.

There does exist a /ready endpoint on Envoy's admin interface [1] however in my testing, it can still return OK even if the xDS configs aren't fully "hydrated" aka, all dynamic resources aren't yet present (which makes sense, as Envoy has no way to know what config should be loaded without explicitly telling it).

What I'd like to propose is some way to give a hint to Envoy in the query params of the Admin request what configs to expect, and then return a response accordingly (either a 200 OK or some non-200 code) which can be used as a Kubernetes http readiness probe [3].

I had two initial thoughts, but open to other suggestions:

Config Dump Endpoint Change:

Add an optional query param to the /config_dump endpoint e.g. /config_dump?expect=1 which will return some non-200 status (e.g. a 404) when ?name_regex={} does not return the expect number of resources [2].

Today, the /config_dump endpoint almost provides all the necessary params to perform the check I want. For example, if you want to wait for a specific dynamic listener to be present, you could do /config_dump?resource=dynamic_listener&mask=name&name_regex=MY_LISTENER however if the listener is not present, the endpoint still returns a 200 OK status with an empty response, which would still pass a Kubernetes probe check. With ?expect={}, we could trivially check on the server side the # of resources which matched the query, and return a non-200 status if they are not equal.

This is probably the most minimally invasive change, since most of the required query params for matching resources already exist on the config dump endpoint, but it also is adding functionality onto this endpoint for things it wasn't really intended for (readiness) so I can see not wanting to add this here, hence my next proposal:

Ready Endpoint Change:

Add the existing resource selector query params from the /config_dump endpoint as optional params to the /ready endpoint, and simply return a non-200 status when the specified query param resource matchers return no matches.

This would increase complexity of the ready endpoint quite a bit, but from a feature standpoint, it probably makes more sense to have this functionality on the /ready endpoint, since from my understanding this endpoint was intended for exactly this use case.

Relevant Links:

yanavlasov commented 1 year ago

@jmarantz for comments

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

jmarantz commented 1 year ago

some form of this seems OK to me. I suppose /ready is the right endpoint and adding a sub-path or query-param so that it returns 200 only some sort of resource is available seems OK to me.

This could also be done as an extension which registers a custom admin endpoint. Maybe a k8s extension deserves a place in Envoy. What else do you think might be useful for k8s?

nulltrope commented 1 year ago

some form of this seems OK to me. I suppose /ready is the right endpoint and adding a sub-path or query-param so that it returns 200 only some sort of resource is available seems OK to me.

Yeah, this seemed like the least friction place albeit maybe overloads the /ready endpoint somewhat since it's rather simple today.

This could also be done as an extension which registers a custom admin endpoint. Maybe a k8s extension deserves a place in Envoy. What else do you think might be useful for k8s?

I'm not overly familiar with when extensions are preferred over building this into the existing admin endpoints, I would need to think if there's any other k8s functionality that would be useful and warrant an entire extension (only thing that immediately comes to mind is better detection of available CPU's for setting worker threads in containerized environments, but that's very different from admin readiness).

jmarantz commented 1 year ago

Extensions are desired when functionality is useful only to a subset of Envoy users, and we don't want to bloat the core binary.

nulltrope commented 1 year ago

I'm not sure it's really k8s specific though, as you may have some external health check mechanism & LB that you point to a fleet of Envoy's running as VM's and similarly don't want to route traffic to new instance(s) till they have fully hydrated configs.

I'm not sure adding additional query params to an existing endpoint would necessarily be bloating the binary but I have no strong opinion here, just so long as the functionality can exist :)

jmarantz commented 1 year ago

It's a good point; a trivial new query-param isn't worth making an extension for, it won't add much size to the binary, and it might be useful for other environments.

Quick question: do we think there may be other k8s-specific stuff to add later though?

nulltrope commented 1 year ago

The distinct readiness/liveness probes are the big things I can really think of- where readiness would hopefully be addressed by one of my proposed changes, and liveness could probably be achieved with a special route using the http health check filter in non-passthrough mode (aka just proving that the envoy container is still responsive/not in a deadlock state requiring a restart). Otherwise, k8s probably should be treated as any other compute runtime.

The only other major challenge I've faced running Envoy (specifically, sidecars) in a Kubernetes environment is startup/shutdown ordering with your application container. However, I believe this is outside the scope of Envoy and is actually mostly solved in the latest versions of k8s: https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/

nulltrope commented 1 year ago

Hi @jmarantz any thoughts? I would like to start implementing one of these options soon.

jdunck commented 7 months ago

@jmarantz friendly bump - finding bespoke ways to await here is kind of a headache for us.

kyessenov commented 7 months ago

Agree that there is a need for this in K8s environment. I'd recommend copying some of the logic from Istio (https://github.com/istio/istio/blob/master/pilot/cmd/pilot-agent/status/ready/probe.go). In particular, you have to wait for workers to start (not just xDS applied) because there's a time gap between when the main thread accepts and the worker starts listening.

There are some flaws in Istio logic when we can improve upon:

SDS is not tracked, and we do see cases when a misconfigured secret fails to stop Envoy to become ready.
Wasm panics are not failing readiness to serve. There are cases when xDS is accepted but fails regardless (intermittently or permanently, e.g. query of death for Wasm plugins) and this should be more visible as a readiness indicator.

jmarantz commented 7 months ago

I'm in support of adding new query-params to /ready to meet k8s needs, and would help review.

nulltrope commented 7 months ago

Agree that there is a need for this in K8s environment. I'd recommend copying some of the logic from Istio (https://github.com/istio/istio/blob/master/pilot/cmd/pilot-agent/status/ready/probe.go). In particular, you have to wait for workers to start (not just xDS applied) because there's a time gap between when the main thread accepts and the worker starts listening.

There are some flaws in Istio logic when we can improve upon:

SDS is not tracked, and we do see cases when a misconfigured secret fails to stop Envoy to become ready.

Wasm panics are not failing readiness to serve. There are cases when xDS is accepted but fails regardless (intermittently or permanently, e.g. query of death for Wasm plugins) and this should be more visible as a readiness indicator.

@kyessenov So, I took a look at the istio probe logic and i'm not sure we want to exactly replicate this. Firstly, Istio's prober is pretty opinionated about what to expect in terms of xDS responses. Whereas for envoy in general, we may have a all static resources, or a static listener referencing dynamic routes/clusters, etc.

My initial thought is for a first pass at this feature, we simply allow passing a query param with a list of named listeners to wait for e.g. /ready?listeners=listener_1,listener_2 and check with the listener manager whether they are in an ACTIVE state which means they have been fully warmed and any dynamic resources that are referenced by the listener are received 1.

Given the initialization procedure document 2, I think we can assume once the listener's are fully warmed/active, that all other resources (which may be delivered via xDS) are also ready. So it shouldn't be strictly necessary to also check for clusters/endpoints independently like the istio prober does.

Finally, you mention SDS and WASM panics. While I think these can be important to check for, they also seem more conditional on your setup and might not be as useful in general. However, maybe as a second feature we extend the ready endpoint with more query params like ?secrets=mysecret to also check other resource types, what do you think?

kyessenov commented 7 months ago

Yes, I think we can design it so that Istio's opinionated prober becomes a configuration or a query parameter difference. We can discuss it in detail once you have a proposed feature. Waiting for workers did make a big difference because there is a time gap between xDS being accepted and actual requests passing through.

nulltrope commented 7 months ago

Ok sounds good, what exactly goes into proposing the feature beyond this GH issue? Would it be best for me to open a PR with the first of my proposed changes (waiting for named listener(s) to be active) and continue discussion there? This will be my first envoy contribution so not entirely sure how y'all handle feature proposals :)

nulltrope commented 5 months ago

@kyessenov @jmarantz I recently learned that the http health_check filter when run in non-passthrough mode already has some ability to report back health based on envoy's config state, notably the "No pass through, computed from upstream cluster health" mode.

What do you think about extending this filter instead to add additional checks for the features discussed above?

envoyproxy / envoy

Better signal for Envoy readiness in Kubernetes when using xDS #29010