elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
201 stars 434 forks source link

[Office 365] - Improve ECS utilization #4319

Closed defendable-forfot closed 9 months ago

defendable-forfot commented 2 years ago

We are ingesting O365 data into our Elasticsearch for search, detection in Elastic Security and visualiation through Kibana. However, we have noticed a few areas for improvement within the module. What is most interesting with this module is how data is ingested. The most interesting data related to the events seem to be all placed within the o365.audit.Data field. This makes search and extraction of data from the log source difficult. Ideally the parsing should be done directly in the Filebeat module. We believe there is data within the field that can be used to populate other, more relevant, ECS fields.

Note: we are running filebeat version 8.1.3, but have noticed that none of the newer releases solves our issues.

Additionally, we believe the ECS specification should be improved with the introduction of a new field within the Related fields section. Certain third-party data sources, the O365 module included, send events where multiple URLs are present. An optimal solution would be to add this data to a related.domain or related.url field, none of which currently exist.

This is a copy of https://discuss.elastic.co/t/office-365-filebeat-module-improve-ecs-utilization/315126, as I was recommended to post this as a GitHub issue instead.

botelastic[bot] commented 2 years ago

This issue doesn't have a Team:<team> label.

WildDogOne commented 2 years ago

I so much agree with you on this! I can open a pull request and try to work on this issue if nobody else has time

However it would help a lot if you could add an example event for each of the problems. Of course I can dig through my own O365, but it's not proving easy ;) but for example, the first issue you mention, does not affect me, because my userids come in the user@domain.tld format

elasticmachine commented 1 year ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

khalavak commented 1 year ago

I second this fully. The 365 audit data to ECS field extractions should really be improved as currently it is very hard to work with and customisations have to be made in order for the Elastic Alerts and data to be usable by Security Analysts working in the SIEM.

jamiehynds commented 1 year ago

Another issue to focus on as we work through O365 improvements:

https://github.com/elastic/integrations/issues/5013

jamiehynds commented 1 year ago

Additional feedback:

Using the standard o365 integration audit logs, the field o365.audit.Data contains json data that is pertinent to the event. The issue is that this field is mapped as a keyword and is not further processed. This field needs to be flattened and the json object should also be ingested into individual fields. This will allow for the better alert analysis required by humans.

Suggested mappings:

IP
o365.audit.Data.sip - ip

Date
o365.audit.Data.ts - date
o365.audit.Data.te - date
o365.audit.Data.at - date
o365.audit.Data.ttdt - date
o365.audit.Data.md - date

Keyword
o365.audit.Data.tid - keyword
o365.audit.Data.lon - keyword
o365.audit.Data.op - keyword
o365.audit.Data.an - keyword
o365.audit.Data.ad - keyword
o365.audit.Data.sev - keyword
o365.audit.Data.rid - keyword
o365.audit.Data.reid - keyword
o365.audit.Data.cid - keyword
o365.audit.Data.tht - keyword
o365.audit.Data.etype - keyword
o365.audit.Data.eid - keyword
o365.audit.Data.f3u - keyword
o365.audit.Data.als - keyword
o365.audit.Data.wl - keyword
o365.audit.Data.ut - keyword
o365.audit.Data.suid - keyword
o365.audit.Data.ail - keyword
o365.audit.Data.von - keyword
o365.audit.Data.sitmi - keyword
o365.audit.Data.dpn - keyword
o365.audit.Data.trc - keyword
o365.audit.Data.aii - keyword
o365.audit.Data.tsd - keyword
o365.audit.Data.ms - keyword
o365.audit.Data.dm - keyword
o365.audit.Data.ttr - keyword
o365.audit.Data.tpt - keyword
o365.audit.Data.tpid - keyword
o365.audit.Data.thn - keyword
o365.audit.Data.imsgid - keyword
o365.audit.Data.fvs - keyword
o365.audit.Data.zu - keyword
o365.audit.Data.pud - keyword
o365.audit.Data.sict - keyword
o365.audit.Data.plk - keyword
o365.audit.Data.mat - keyword
o365.audit.Data.alk - keyword
o365.audit.Data.zmfn - keyword
o365.audit.Data.zmfh - keyword
o365.audit.Data.zfn - keyword
o365.audit.Data.zfh - keyword
o365.audit.Data.sid - keyword
o365.audit.Data.etps - keyword
o365.audit.Data.upfv - keyword
o365.audit.Data.upfc - keyword
o365.audit.Data.ot - keyword
o365.audit.Data.od - keyword

Keyword - this had no analytical value in my instances, but could be helpful for other customers
o365.audit.Data.tdc - keyword
o365.audit.data.af - keyword
o365.audit.Data.ssic - keyword
o365.audit.Data.cpid - keyword
o365.audit.Data.srt - keyword
chrisberkhout commented 11 months ago

@defendable-forfot, @WildDogOne & @khalavak,

If you can provide example data for the Data.*, Parameters.User or Parameters.DomainName fields, that would be very helpful.

For example, the data I've seen shows Data.f3u and Data.suid fields having values like user@domain.tld, rather than SecurityComplianceEvent, SecurityComplianceInsights or SecurityComplianceEvent as mentioned in the issue description.

I haven't been able to find documentation of the various Data.* fields, except for the Office 365 Management Activity API schema documentation describing Data as being one of:

  1. The detailed data blob of the alert or alert entity. (here)
  2. Data string which contains more details about investigation entities, and information about alerts related to the investigation. Entities are available in a separate node within the data blob. (here)

If documentation of these individual alert or investigation fields does exist, any tips would be much appreciated.

chrisberkhout commented 10 months ago

Below I have attempted to restate and respond to each of @defendable-forfot's suggestions.

Many of the suggestions relate to undocumented fields or values that may vary between environments and for which sample data is not currently available.

The relevant upstream documentation is Office 365 Management Activity API schema. For the o365.audit.Data field we only have a small amount of example data, which is listed under "Known example values for the Data parameter" in https://github.com/elastic/integrations/pull/8571.

In this round of improvements I intend to:

The original suggestions refer to Filebeat's Office 365 module but I will attempt to apply them to the preferred, Agent-based Microsoft 365 Elastic Integration. Wherever the suggestions don't seem to apply, the change to the Agent-based implemenation may explain the mismatch.

Responses to each suggestion are inline in > bold.

Miscellaneous suggestions

A suggestion about o365.audit.Data fields

The most interesting data related to the events seem to be all placed within the o365.audit.Data field. This makes search and extraction of data from the log source difficult. Ideally the parsing should be done directly in the Filebeat module.

> There is a PR to parse and index this data in the Microsoft 365 integration, here: https://github.com/elastic/integrations/pull/8571

A suggestion about new related.* ECS fields

ECS could be improved by adding related.domain or related.url fields, to be used by data sources, including the o365 module, that send events with multiple URLs.

> The closest existing field is related.hosts, which is for "All hostnames or other host identifiers seen on your event. Example identifiers include FQDNs, domain names, workstation names, or aliases."

> I've added an ECS issue, Add related.url field, to discuss this proposal further.

A suggestion about o365.audit.Name

When o365.audit.Name exists, its value populates rule.name.

In such cases the message field could also take that value, instead of New alert.

> The ECS message field value description says "For structured logs without an original message field, other fields can be concatenated to form a human-readable summary of the event.".

> Currently, message is set to the value of the incoming field Comments for SecurityComplianceAlerts events (an example value is "New alert"), or the incoming field ExchangeMetadata.Subject for ComplianceDLPExchange events (the value being an email subject line).

> The Comments and Name values could be concatenated into message for a richer description, but this cosmetic improvement would come at the cost of having the Comments value unavailabe in its unmodified form. I think it's best not to change this for now.

User data suggestions

Unless otherwise indicated, these suggestions relate to the population of the ECS fields user.domain, user.email, user.id, user.name, and related.user.

Parsing for user.name and user.email values

In some cases, including some involving Exchange, user.name and user.email values have a domain prefix (domainname\) which should be removed and used to populate user.domain.

> Note: this suggestion was given in connection with the o365.audit.UserKey field and the O365 Exchange Suspicious Mailbox Right Delegation detection rule.

> The current logic for populating user.email, user.name, and user.domain will map an incoming value of username@inetdomain.com to "user.email": "username@inetdomain.com", "user.name": "user", and "user.domain": "inetdomain.com"

> Although user.domain is an appropriate field for storing both a Windows networking domains and Internet domains, before attempting to extract Windows networking domains from user.name and user.email values I would like to 1) have example data (none of our current examples have the Windows networking domain prefix), and 2) be able to clearly distinguish between a Windows networking domain prefix separated by a backslash and other uses of a backslash (valid email addresses may contain backslashes in the user name).

o365.audit.UserId and o365.audit.UserKey non-user value

Where UserId or UserKey matches /^SecurityCompliance.*/, that value should not be set in user.id.

The actual user data may be available in o365.audit.Data.

> Note: The UserId point was noted as being the case for "record types that are not related to Microsoft Exchange, Azure and SecurityComplianceCenterCommand". There is a large number of such record types.

> Currently, there is no reference to UserKey in the pipeline configuration. Its incoming value is retained as o365.audit.UserKey. The UserId field is renamed to user.id.

> The Management Activity API schema: Common schema documentation describes UserId as "The UPN (User Principal Name) of the user who performed the action (specified in the Operation property) that resulted in the record being logged; for example, my_name@my_domain_name. Note that records for activity performed by system accounts (such as SHAREPOINT\system or NT AUTHORITY\SYSTEM) are also included."

> Although values such as SecurityComplianceAlerts seem to refer to a service or function rather than a user or even a system account, I think the choice of this value for UserId in upstream API logic should not be overridden in the pipeline logic.

o365.audit.Parameters.User has user data

A value in o365.audit.Parameters.User can be put in related.user.

In cases where o365.audit.Workload="Exchange" that value will related to the user on which the action is being performed.

> Available example data includes values for this field such as:

EURPR01A002.prod.outlook.com/Microsoft Exchange Hosted Organizations/testsiem.onmicrosoft.com/Discovery Management
EURPR01A002.prod.outlook.com/Microsoft Exchange Hosted Organizations/testsiem.onmicrosoft.com/Discovery Management

> I will open a PR for this change. PR: https://github.com/elastic/integrations/pull/8803

o365.audit.Data.* user data

The following fields are suggested to contain user data, in particular when o365.audit.Workload="SecurityComplianceCenter" and o365.audit.RecordType!="24":

> Note: The RecordType=24 corresponds to member name "Discover", described as "Events for eDiscovery activities performed by running content searches and managing eDiscovery cases in the Security & Compliance Center."

> The Data.isda field is not in the list of known fields used for https://github.com/elastic/integrations/pull/8571, but that PR will make its value available in o365.audit.Data.flattened.isda. Before indexing that field directly under o386.audit.Data, it would be good to receive confirmation of its use, and example data.

> Although the presence of an incoming ClientIP suggests there is an initiator of a network connection related to this event, the client.user field set seems redundant when not used to distinguish between an initiator (client) and a responder (server).

> Available example data shows f3u, suid, tsd and trc as having values that match the format of an email address. The user.email and user.id fields could potentially be populated with these values, but given their undocumented and uncertain meaning, I think a better choice is to add values that appear to be email addresses into related.user to aid discovery and allow integration users to do any further interpretation of these values themselves.

> I will open a PR to add f3u, suid, tsd and trc values to related.user when they are in email address format. PR: https://github.com/elastic/integrations/pull/8803

URL data suggestions

Unless otherwise indicated, these suggestions relate to the population of the ECS fields url.domain, url.extension, url.original, url.path, url.scheme, and url.subdomain.

o365.audit.Parameters.DomainName has domain data

When present, use it to populate the relevant ECS fields.

> The Parameters field contains the "name and value for all parameters that were used with the cmdlet". For the Exchange Admin schema this is a cmdlet that that is identified in the Operations property. For the Security and Compliance Center schema it is noted this will not include PII.

> There is no example data available for this field. It's unclear whether a DomainName value would refer to the domain of a URL and be suitable for url.domain, or to a Windows Networking domain which would not. I would want to confirm the meaning of this field and have example data before populating ECS fields with its value.

o365.audit.Data.* URL data

The following fields are suggested to contain URL data:

> In example data, we have reid values of "cannot be shared" (from a public blog post, likely not the value delivered by the API) and "23a5e271-e297-4f35-ff57-08d7b17f5bf2" (from test data). If reid can contain a concatenation of different types of data, it may be difficult to dependably extract a URL from it. For zu and alk we have no example data.

> A URL value from an undocumented field may be easier to use than other values because a URL is data of specific format that is strictly defined. However, before attempting to extract URLs from zu, alk or other fields I would want to have some example data that confirms their presence.

jamiehynds commented 9 months ago

Hey @chrisberkhout - do you think we can close this issue on the back of the v2.1.0 update to O365, or are there still some outstanding items to address?

chrisberkhout commented 9 months ago

I think this is done for now. We can revisit it in the future if we get more feedback and data. The changes made were: