envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.02k stars 4.82k forks source link

add a co-processor for config update, to allow customized behavior on config loading. #19858

Open stevenzzzz opened 2 years ago

stevenzzzz commented 2 years ago

Title: add a co-processor for config update, to allow customized behavior on config loading.

Description: At the present, the controlplane/config loading oof Envoy is relatively "closed" compared to the highly "configurable" dataplane portion. It's really hard to debug if the Xds status gets into a skewed status.

Similar to HBase coprocessor, we could have a [list of] coprocessors registered and function on the onConfigUpdate grpc_mux_impl or at every XDS callbacks's onConfigUpdate, to offer possibly better customized (and very likely not general enough to be upstreamed) behaviors for Envoy config loading path:

stevenzzzz commented 2 years ago

@yanavlasov @htuch

yanavlasov commented 2 years ago

@adisuissa

adisuissa commented 2 years ago
  • they could either be in-place mutating the received config

I've been thinking about this lately probably from a different context. Can you provide some use-cases where you would like this feature?

  • or even just receiving a const reference of the received resources. (observer/immutable)

See #19855 (and draft PR #19857) as they may be relevant.

stevenzzzz commented 2 years ago

for example, to debug a XDS issue with config server, I'd like to see what resources has been received/deleted on config updates, we may want it logged or exported to some metrics system.

we could also add some very customized logic to some specific resource: say in config sharding, if a resource is "unrelated to my shard, I'd just drop it on the floor without processing it.

stevenzzzz commented 2 years ago

and yeah, extra validation would be something we can add as a co-processor as well.

htuch commented 2 years ago

It sounds like this is the generalization of #19855. https://github.com/envoyproxy/envoy/issues/6909 is also relevant here, CC @yskopets.

stevenzzzz commented 2 years ago

I think it's a very-good-to-have. It essentially allow customers who reads config from a centralized config manager(e.g. a cloud service provider) to do quicker turn-around when things happen. or help the customers to easier build tools their business needs, around the config-control of Envoy.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

stevenzzzz commented 2 years ago

/assign stevenzzzz

stevenzzzz commented 2 years ago

/nostablebot

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

stevenzzzz commented 2 years ago

@adisuissa could you pls nostalebot this one?

adisuissa commented 2 years ago

IIUC we now support customized behavior on config loading for use cases that do not modify the config.

IMHO config modification could lead to subtle issues that will result in invalid config state, so it should be avoided.

Is there anything else this issue covers?

stevenzzzz commented 2 years ago

it adds the capability for users to manage the config from the client side, the end user is responsible for what the co-processor could bring.

Some use cases could be: logging resources received, reject config with some customized logic, or even just store/snapshot the config local for fail-static handling.

adisuissa commented 2 years ago

The fix fr this could be very simple (removing the constness of the custom config validators). I'm still not sure if I'd want to see something like that, as we shift the "ownership" of the config from being provided and known by the config-management server to be split between the server and the client. This could lead to subtle bugs as the client may remove a resource (e.g., cluster), and will send back an ACK to the server, the server will think that the client supports that resource, but actually it won't. It's much harder to debug and understand what's going on when this happens.

stevenzzzz commented 2 years ago

it'd be useful in the case that management server is not owned by the user, but some cloud service provider. which allows much faster turnaround if the user needs to have some feature faster than the management server could offer, or not generalized enough for the management server to work on.

jmarantz commented 2 years ago

@jmarantz

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

stevenzzzz commented 2 years ago

/nostalebot

stevenzzzz commented 2 years ago

@jmarantz Josh could you please nostalebot this issue. I think it's more flexible and we should give it a shot.