elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.76k stars 8.17k forks source link

[Security Solution] Troubleshooting and Diagnostics of the Detection Engine (Draft) #124947

Open banderror opened 2 years ago

banderror commented 2 years ago

Summary

While working on recent SDHs, it became evident that, in contrast to Elasticsearch, Kibana, and Task Manager, we don't have a lot of diag data for Security Solution and Detection Engine. There's not a lot of console logs, rule execution logs stored in .kibana-event-log-*, not enough correlation ids in all those logs, support-diagnostics tool does not support dumping anything related to Detection Engine.

Plan

Improve logging from rule executors. Write more/better logs with more correlation ids:

Improve logging from route handlers. Write logs with correlation ids from Security Solution's API endpoints:

NOTE: Correlation ids can be attached to any console log record via an additional LogMeta object (example) and are available for slicing and dicing if Kibana logs are ingested to ES. We could potentially leverage this in Cloud.

Include correlation ids to outgoing requests to Elasticsearch. Since we need to analyze tasks.json file (generated by support-diagnostics tool) and it's not clear what rule sent a particular search request (and was it even a rule), it would be great if we could attach some correlation ids to requests that we send to Elasticsearch:

Maybe it could be done via custom HTTP headers similar to X-elastic-product-origin etc that we can see in tasks.json.

Measure more rule execution metrics:

NOTE: Detection Engine performance benchmarking could read generic and rule type-specific metrics written to Event Log during the benchmarking and calculate statistics (median, percentiles across all rules, per each rule type, per each rule, etc) as a result.

elasticmachine commented 2 years ago

Pinging @elastic/security-solution (Team: SecuritySolution)

elasticmachine commented 2 years ago

Pinging @elastic/security-detections-response (Team:Detections and Resp)