elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.69k stars 8.24k forks source link

[Response Ops] Break down task run success SLI into observability and security alerting types #165989

Open ymao1 opened 1 year ago

ymao1 commented 1 year ago

With https://github.com/elastic/kibana/pull/163652 we added SLI metrics for task run success, broken down into alerting and action task types. We think it'd be useful to further break down the alerting task types into security alerting task types and observability alerting task types. This would help us narrow down where to focus our investigations when those SLOs are breached.

Currently the metrics look like

{
  "task_run": {
    "timestamp": "2023-09-06T13:43:52.205Z",
    "value": {
      "by_type": {
         "alerting": {
           "success": 1,
           "total": 1
         },
         "alerting:<ruleTypeId>": {
           "success": 1,
           "total": 1
         }
      }
    }
  }
}

It'd be useful to add a grouping for alerting_security and alerting_observability

elasticmachine commented 1 year ago

Pinging @elastic/response-ops (Team:ResponseOps)

mikecote commented 1 year ago

@kobelb should we prioritize this for 8.11?

kobelb commented 1 year ago

@kobelb should we prioritize this for 8.11?

Yes. Otherwise, we're going to have to manually investigate and route these issues.

ymao1 commented 1 year ago

We might be able to do this in the SLO itself by filtering on project type