elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.57k stars 8.09k forks source link

[DOC] Monitoring Kibana Rules #126912

Open stefnestor opened 2 years ago

stefnestor commented 2 years ago

👋🏼 @KOTungseth @gchaps adding placeholder from our conversation about copying over my Medium article about monitoring Kibana Rules so Dev can confirm/review. @pmuellr @gmmorris,

Recently, someone asked me how to monitor the Kibana Rules. They were interested in automation targeting reporting on historical/current Rule issues before their end-users raised them.

History Kibana SIEM (aka. Security “Detections Engine”) is a specific type of the Rule task which is a sub-set of Kibana Tasks (architecture PDF). Kibana Task processing stores Event Logs into space-agnostic .kibana-event-log* (and by association (SIEM) Rule processing info). Therefore, you can use this query to view top expensive Rules across spaces.

Current (SIEM) Rules store into .kibana retaining object sub-JSON [execution_status, executionStatus] (depending on how you pull). You should never directly write to this system index; but where as listing/searching Rules is space-aware, reading from this index allows you to be space-agnostic. Common queries on current state:

  • enabled: GET .kibana/_search?q=alert.enabled:true
  • erring: GET .kibana/_search?q=alert.executionStatus.error:*

Sum The above two are “I want to investigate one-by-one” with historical and current. The remaining investigative ballpark is “maybe everyone themselves is ok, but them together hurts the system”. To investigate that, you’ll want to check the Kibana Task Manager Health API. Here’s output interpretation walk through.

Automate Using the above investigative ballparks, for live/strenuous Rule setups, usually I recommend customers setup either a Watcher or polling via external Elasticsearch client to automate checking the above. (We would check via Kibana Rules, but “Quis custodiet ipsos custodes?”.)

elasticmachine commented 2 years ago

Pinging @elastic/kibana-docs (Team:Docs)

pmuellr commented 2 years ago

I'm not against this, but one of the problems is that as the alerting documents / indices change over time, this information is going to become out of date. OTOH, over time, I'd expect our monitoring of alerting within the product itself is going to get a lot better and the need for these low-level searches won't be generally needed.

Perhaps we could take the queries we plan on talking about, and build tests for them (functional tests), noting in the test that if they fail, the doc will also need to be updated.

Is there anything similar in the Kibana docs, to get a sense for how others have dealt with this kind of stuff?

stefnestor commented 2 years ago

Perhaps we could take the queries we plan on talking about, and build tests for them (functional tests), noting in the test that if they fail, the doc will also need to be updated. Is there anything similar in the Kibana docs, to get a sense for how others have dealt with this kind of stuff?

@elastic/kibana-docs can you speak to this?

pmuellr commented 2 years ago

My comment ^^^ is basically blue-sky. We can get by for quite some time without it, of course.

stefnestor commented 4 months ago

đź‘‹ So we've written a related blog (only change was adding a * on .kibana). IMO there's still appetite for this from customers. Could we reconsider porting the above content / blog into Kibana Troubleshooting docs?