Open jvalente-salemstate opened 5 months ago
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)
Hi @jvalente-salemstate , Thank you so much for you feedback. I agree that fetching both incidents and alerts makes the single data-stream noisy when handling incidents with more than few alerts. So, I am happy to inform you that in latest release(2.7.0), we have added support for /alerts
dataset separately but didn't change anything in existing data-stream considering the impact on existing users. Please feel free to share more feedbacks on the same.
I've updated and monitored for a few days. It's looking like the volume is of alerts looks like the change ranges from no change for incident with a single alert (1 event in each datastream) and up to a 98% decrease for some larger incidents. For the last 4 or 5 days, it looks like it'd cut about 2/3 of the volume from the excess alert duplication
I would maybe include an option for when Incidents are enabled to enable/disable collecting alerts via that data stream. This should be on by default to not impact existing users while allowing folks using the alerts datastream to not have the duplicated alerts.
Interestingly, I didn't add SecurityAlert.Read.All
and it seems to work fine with just SecurityIncident.Read.All
. Logically this makes sense, because the alerts can be read anyhow via the incidents endpoint. It's odd Microsoft doesn't state this permission will work for /alerts_v2
though, or at least I can't find any documentation that says it should.
Having the alert evidence separate would still be helpful, but I get that the PR was made even before the issue was filed.
Hiya @piyush-elastic, passing along a recent feature request that I think might be similar - FYI @Leaf-Lin in case you want to share any extra context.
Feature requests:
- M365 Alerts can be “duplicated”, this is an issue with the Rest API itself, which represents each record as a historical timeline instead of an object.
- We believe a “latest” transform grouping by the alert ID would fix this largely. The general issue we see is AutoIR (Microsoft “response actions”) closing cases a second or so after they are created. If we had the latest data, we could simply exclude resolved alerts from generating alerts within the security app.
- This would still leave the issue where the Elastic case could de-sync from the M365 case, but for the major issue noted above, this is a minor issue.
- M365 data could improve the use of ECS fields, for example, host.name is rarely populated, even if multiple related.hosts are identified.
This is a pretty lengthy one but since it's a substantial change in how the integration works, I wanted to give as much information on why this change is necessary and provide as much info as possible for implementing the changes.
Summary
This should decrease log volume--significantly for some incidents, make all three types of information much easier to work with, and provide a lot more value in using evidence for correlation, threat detection, and threat hunting.
Problem
The
m365_defender.incident
data stream can be excessively noisy when handling incidents with more than few alerts. Information in alert evidence is being parsed in a way that that is not useful for analysis, while contributing to the creation of large documents combined with the number of documents.For example, earlier this week, had an incident generated by M365 Defender and it has 95 alerts. With how the incidents are being processed, this has generated 16,654 events as any combination of alert properties changed. This was simply an informational incident, and if it was one where statuses, comments, and such were actively being worked on (vs resolving all at once), it could be several times to a an order of magnitude higher. A lot of these problems were both directly and indirectly touched up on in #8231
Why this is an issue
When pulling in incidents, the MS Graph API is used with
?$expand=alerts
which returns a collection of incident objects, will include a collection of objects for every alert in the incident. Within that alert is yet another collection of evidence objects associated with that alert.What the lastUpdateDateTime represents
The
lastUpdateDateTime
for an incident changes any time one of the alerts within (including properties of any alert evidence) are updated, statuses change, alerts are moved, incidents are merged, and so forth. Overall changes aren't always instant so if an incident is closed at 11:23 and all except one alert is closed between 11:23 and 11:25, the incident updated as expected and included when the next API call is run, at 11:25 or example. If the last alert is updated at 11:26, the incident'slastUpdateDateTime
is also 11:26 and included in the next pull.Alerts split into events
The call to MS Graph includes alerts that have not been updated since the last call. These are split into indivdual documents representing the alert. A single lalert event will just create one document. In my case, any minor change in even one out of 95 alerts resulted in 95 documents. If a new alert is added to the incident, the next pull has 96 instances of the same incident, and if that alert has its automated investigation status change, another 96 documents will be created. About 190 more than really needed if no other alert has updated.
Below is a snippet of the stream's
httpjson.yml.hbs
file:Alert Evidence
Evidence represents a resourceType per MS docs, representing one of several kinds of entity. Unlike alerts, which are split out into documents, these have the
dot_expander
processor applied to them. The ingest pipeline iterates on these to append each property to a list for that field.This provides details on a field level but strips them of any context to their overall entity. Because some entities may share some fields with others, and some being renamed, making it difficult to associate the values with their entity. For fields that are lists (roles, threats, tags) this even more difficult because these 3 objects may return a list with more than 3 roles or threats. Here's an example of a reported email message, having mailbox, message, and user evidence.
Here's the JSON representation of those same three objects to illustrate the difference.
There isn't any feasible way to recreate the original objects. There's no They're also adding a large amount of fields to events, which are already duplicated to sometimes extreme degrees.
For reference, here's
default.yml
:Proposed Solution
Do not expand alerts with the list incidents API (or if done, use it to capture
m365_defender.incident.alert.provider_alert_id
, as a list, for correlation. Instead, make a separate API call for alerts via the/security/alerts_v2
endpoint. Place these into a dataset, such asm365_defender.alert
or similar name.This will still pull updated alerts with their evidence without the inclusion of non-updated alerts. Evidence could be split into separate documents. It may be necessary to include a field to mark the related alert for correlation.
The new dataset should also be optional, as these alerts and evidence captured in the
m365_defender.events
dataset if the tables,AlertInfo, AlertEvidence
are exported to the event hub. I'd still leave the option for those not using an event hub or that simply prefer the API's use.Additionally, a majority of the yml.hbs file is handling evidence fields and building out lists for each property. Having these as a document should simplify that and keep it to just renaming fields.
Issues to consider
SecurityAlert.Read.All
.event.kind: alert
is straight forward, but evidence and incidents may need some thought.event.kind: enrichment
seems like it'd be good for evidence.