elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.58k stars 8.09k forks source link

[ML] Expose module recognition requirements #83585

Open rylnd opened 3 years ago

rylnd commented 3 years ago

Describe the feature: When using the recognizer API, unmatched modules are omitted from the response. A user seeking to have their data "recognized" by a particular module has no clear path forward if the module does not match.

Describe a specific use case for the feature: In the Detection Engine, we use the recognizer API to determine which ML modules are relevant to the user's security indexes, and provide a UI to create/enable those relevant modules/jobs. Exposing this in the ML API would allow us to relay it to the user, giving them actionable information about a particular module's requirements.

elasticmachine commented 3 years ago

Pinging @elastic/ml-ui (:ml)

rylnd commented 3 years ago

@randomuserid brought this up in the context of a more specific use case: warning users if they're missing OS-specific data required by (upcoming?) OS-specific jobs. If necessary I'll let him expound on that.

sophiec20 commented 3 years ago

Hypothetically, if we were to move ML modules definitions into Integrations, then the relevant module definition would only be available if the Integration had been enabled.

Therefore the Recognizer would only know about current Integrations, and therefore one hopes there would be a strong match between data and relevant jobs.

This assumes, going forwards, that data is on-boarded via Integrations.

Looking forward to hearing more about the motivations for this enhancement request.

randomuserid commented 3 years ago

So with the move to multi-index jobs, the compatibility matrix becomes even more complex due to the nature of Windows logging which has evolved over several decades with new layers added onto existing logging modules and products. The older layers like the Security Event Log can generate events for process creation which provide the fields needed by four of the data feed queries for the ML jobs in the security solution. Other fields and event types are not available in the Security Event Log - network events for example - and require events from an EDR-like agent such as the Elastic Endpoint or the Sysmon agent. A few jobs need very specific events from specific Windows event log providers such as the Powershell logs.

This matrix plots which event sources can power the multi-index jobs:

platform Job test cases pipelines tests pass status
windows rare metadata process, windows 2 Endpoint, Sysmon 2 done
windows rare metadata user, windows 2 Endpoint, Sysmon 2 done
windows rare process by host, windows 3 Endpoint, Sysmon, Security 3 done
windows rare process, windows 3 Endpoint, Sysmon, Security 3 done
windows anomalous network activity, windows 2 Endpoint, Sysmon 2 done
windows anomalous username, windows 3 Endpoint, Sysmon, Security 3 done
windows anomalous path activity, windows 3 Sysmon 1 limits https://github.com/elastic/mechagodzilla/issues/130
windows anomalous process creation, windows 3 Endpoint, Sysmon, Security 3 done

The result is that it becomes far more possible and likely for a user to enable an ML rule / job package for which they lack the requisite events. Tom's suggestion was that we try to warn the user with a toaster or some other indication when they are enabling a package for which they lack the needed events.