elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.48k stars 8.04k forks source link

[ResponseOps][Connectors] log request body for failed webhook calls #186531

Open pmuellr opened 2 weeks ago

pmuellr commented 2 weeks ago

A frequent problem customers have is bad responses from webhooks caused by invalid formatting in the webhook body - invalid JSON, for instance. Unfortunately, the error messages for these from typical servers rarely provides any details about the data failure.

So, we should provide some details - specifically, the request body. I think including these in logged messages would be useful, and we may want them in the message or some other field in the event log, so that we'd have a hope for the user to see these in the connector logs UX, without having to look in the logs.

elasticmachine commented 2 weeks ago

Pinging @elastic/response-ops (Team:ResponseOps)

pmuellr commented 2 weeks ago

It was noted in triage that if we write the webhook body to a log or index, it will very likely contain PII, so we need to take that into consideration.

I thought we might have had an issue open for this already, but didn't really. Issue #175842 started with a UX confusion, but then evolved into issues diagnosing problems with webhook bodies, so I closed that one in favor of this. And added to meta issue #151773 - there was already a list item, but no issue assigned.

pmuellr commented 2 weeks ago

From the linked SDH ^^^, I thought it was interesting to note the customer turned on debugging logging for a bit - so at least for them, having the body available in the debug logs would probably be good enough for them to diagnose the error. Probably not the case for a lot of customers though ...

pmuellr commented 2 weeks ago

Some more thinking on this: the event log is really the best place for this, as we can show bits in the UX, and the event log is fairly easy to search through in any case. It already contains some amount of PII like rule names, and access to it via API handles some RBAC over the sources of the events (eg, rule id or connector id).

But we probably don't want to log these all the time, as they could be large, and are only occaisonally useful. So thinking an opt-in cloud-enabled config option to enable writing the request / response (perhaps) to the event log seems like a fairly constrained way to do this.

Hopefully we can find some field(s) in the EL that already exist, to contain these, but we may need to add some new (non-indexed) fields.