Kuadrant / kuadrant-operator

The Operator to install and manage the lifecycle of the Kuadrant components deployments.
Apache License 2.0
40 stars 33 forks source link

Data Plane Debugging and Troubleshooting #1058

Open maleck13 opened 13 hours ago

maleck13 commented 13 hours ago

What

Document useful ways to debug what is happening when hitting the Gateway

Why

The goal here is to enable users and developers to better understand how RatelimitPolicy & AuthPolicy interacts with the various components and how a user can debug and troubleshoot issues they may hit when applying Policies at the dataplane

maleck13 commented 13 hours ago

@eguzki @alexsnaps I think this would be a useful thing for the team and also for our users. I am adding one for DNS WDYT

alexsnaps commented 12 hours ago

This is probably useful in any case, but I've been toying around with doing this a little differently (as well?). Did a few edits, not specific to RL, but rather the entire data plane tbh (and RateLimit- & AuthPolicy do interact with each other as well...)

I'd like to get Kuadrant as a whole in some "development mode". So that when you'd hit a URL through the Gateway it would, while leveraging the distributed tracing probing mechanism already present in the different code bases, gather the spans and "pipe" them thru ~http headers~ (can't, looks like you don't get access to the headers of the response from the host) the dynamic_metadata: well_known_types::Struct of both gRPC responses back to the wasm-shim. That would then gather it all and, on error, use that to populate the response body with a nice message of what happened; or, append it all to the response. Either as headers again (tho that'd be less nice to debug) or find a way to attach it to the actual body.

I was thinking something like some "secret (i.e. a string or something)" possibly attached to some resource, the kuadrant CR? kuadrant_dev_key: lol and now you could get that data on per request by passing ?kuadrant_debug=lol to your URL requested from the Gateway or something...

Anyways, not completely fleshed out, but I think I could put a prototype together in a few days... and looks like I might be able to free that time up in the next couple of weeks... wdyt?

Two advantages I think: more user friendly and it won't get out of date, like a doc would as this automatically updates itself with reality.