elastic / ecs

Elastic Common Schema
https://www.elastic.co/what-is/ecs
Apache License 2.0
997 stars 413 forks source link

Introduce display_name for threat.indicator #1998

Open maxcold opened 2 years ago

maxcold commented 2 years ago

Summary

Introduce Display Name for an Indicator of Compromise (IoC) in the Threat Intelligence part of ECS

Motivation:

In the Threat Intelligence capabilities of the Security Solution, our team is working on the data grid for IoCs (Indicator of Compromise) where we have an "Indicator" column, which serves as a "Display name" for an indicator. The value of this column currently depends on the indicator type and every type has its own logic. The best example is the File indicator. It can have different hashes, eg. sha256, md5, etc. We take sha256 for display name when available, if not then md5 and so on. For other types there is not much logic involved, we take different values from different threat.indicator* attributes per type. The problem with this approach is that this Display Name is not available as an attribute on Elasticsearch documents coming from Threat Intelligence integrations, therefore users won't be able to filter by it and perform other standard operations they can do for existing ECS attributes. This can be solved partly with runtime fields, but we think adding Display Name to the schema might make sense and want to kick off the discussion about it Features dependant on having display_name field:

  1. Filter in/out the Indicator of Compromise view by display_name
  2. Add Indicator display_name to a Timeline
  3. Create an Indicator of Compromise event renderer in Timelines
  4. Create a pre-built Timeline template for investigating IoCs

Detailed Design:

Introduce threat.indicator.display_name with the following logic per threat.indicator.type

file: 
threat.indicator.file.hash.* (sha256 | md5 | sha1 | sha224 | sha3-224 | sha256 | sha3-256 | sha384 | sha3-384 | sha512 | sha3-512 | sha512/224 | sha512/256 | ssdeep | tlsh | impfuzzy | imphash | pehash | vhash)

url: 
threat.indicator.url.original

email-addr: 
threat.indicator.email.address

email (subj?): 
the suitable field is missing, map to _id?

email-message:
the suitable field is missing, map to _id?

domain-name:
threat.indicator.url.domain

domain:
threat.indicator.url.domain

ipv4-addr: 
threat.indicator.ip

ipv6-addr: 
threat.indicator.ip

x509-certificate:
threat.indicator.x509.serial_number

x509 Serial:
threat.indicator.x509.serial_number

windows-registry-key:
threat.indicator.registry.key

autonomous-system:
threat.indicator.as.number

mac-addr:
threat.indicator.mac

unknown:
map to _id

Data examples can be found in AbuseCH, Anomali, Cybersixgill, MISP, OTX, Recorded Future and ThreatQ integrations

maxcold commented 2 years ago

@jamiehynds fyi, following up on our discussion

peasead commented 2 years ago

I think that threat.indicator.name would align with other ECS fieldsets (host.name ,user.name, etc.)

Also, here's the email ECS fieldset https://github.com/elastic/ecs/blob/main/schemas/email.yml, which could be nested under threat.indicator.

maxcold commented 2 years ago

Thanks for the comments, I agree that threat.indicator.name is probably more consistent with the rest of the schema. Also agree that it probably make sense to nest the email fieldset, as currently only threat.indicator.email.address exists to my understanding at it doesn't match with https://github.com/elastic/ecs/blob/main/schemas/email.yml, (threat.indicator.email.address should be threat.indicator.email.from.address if I understand correctly when following the existing email schema)

peasead commented 2 years ago

Yep, you're following the schema properly!

Back story We made some decisions early on regarding the directionality of some types of indicators when writing the threat.* ECS fieldset.

We felt that ECS fieldsets that incorporate directionality (source.ip, destination.domain, email.from.address, etc.) would lead to confusion when trying to get indicators into the right fields. As an example, if the indicator is 1.2.3.4 - is it command & control infrastructure or is it the source of a password spraying campaign. Should it be threat.indicator.source.ip : 1.2.3.4 or threat.indicator.destination.ip : 1.2.3.4? In the email example, is this the source of a phishing email or the reply-to address?

Contextually, it could be both - so if threat provider A marked it as a source and threat provider B marked it as a destination; you could have duplicate threat indicator matches or worse, contextually incorrect assumptions based on how someone viewed the directionality.

We opted to avoid the confusion of directionality by not including it and doing threat.indicator.ip : 1.2.3.4, threat.indicator.domain, threat.indicator.email.address, etc and allowing an analyst to determine //what// happened using other fields populated during the enrichment - like network.direction : ingress|egress

Commentary I think if we wanted to reapproach directionality, we could do that, but having looked at the feed data over time, I think directionality would be difficult.

maxcold commented 2 years ago

@peasead Thanks a lot for the Back Story, it answers a lot of questions that I had as I wasn't really thinking about the directionality of the indicators. It makes total sense! So I think it should stay the way it is now, meaning for email-addr IoC type as just threat.indicator.email.address. But now I'm not sure what part of the description you were commenting with this note

Also, here's the email ECS fieldset https://github.com/elastic/ecs/blob/main/schemas/email.yml, which could be nested under threat.indicator.

Can you clarify so we are on the same page?

peasead commented 2 years ago

I think you can disregard that. I didn't fully grasp what you were asking until I started the larger response.

Sorry!

maxcold commented 2 years ago

@jamiehynds can you help me find the right people to ping on this issue so we move forward with it?

jamiehynds commented 2 years ago

@maxcold based on the discussions above with @peasead, do you think threat.indicator.name is a more suitable fit than the original threat.indicator.display_name proposal?

Could you also provide a proposal for the field description and allowed values, which we'd include in the ECS documentation? As an example, here's the description and allowed values for an upcoming addition to event.category - https://github.com/elastic/ecs/issues/2028#issuecomment-1216805476

@ebeahan given that this proposal is a relatively minor change, I'm assuming an RFC isn't warranted, but pinging you just incase you feel otherwise.

djptek commented 2 years ago

@maxcold is it likely that e.g. threat.indicator.name could have multiple values for distinct indicators within a single event?

If, so, we'd perhaps want to consider adding

threat.enrichments.indicator.name

where enrichments is an array containing multiple indicators as well as

threat.indicator.name

maxcold commented 2 years ago

@jamiehynds yes, I think it makes sense to have threat.indicator.name for consistency with the rest of ECS Description: "The display name of the Indicator of Compromise in UI friendly format" Allowed values: there is a mapping between the type of IoC and which field from the document should be used for the name field. How should I go about allowed values in this case? You mentioned that the change is minor but we not only want to introduce this new field, but also want it to be populated for TI integrations based on the mapping logic I added into description. Is it in the scope of this issue or I will need to add a new issue to implement the logic in the Integrations?

@djptek good point, I think it makes sense to add threat.enrichments.indicator.name too in addition to threat.indicator.name which follows the same logic and share the same Description/Allowed Values and serves the IoC name when added as enrichment to Alerts for example

djptek commented 2 years ago

You can specify expected values in the schema yml for a field, see e.g. event.category

maxcold commented 2 years ago

@djptek thanks for providing an example. One question, you linked to expected_event_types attribute, did you mean to link to allowed_values? I'm just not sure how expected_event_types is relevant here. As for allowed_values the problem is that the proposed threat.indicator.name is not an enum, the value and the type of this value depends on the threat.indicator.type. Here are some examples For an `'ipv4-addr' indicator

{
  threat:
  {
    indicator: {
      _id: '123',
      type: 'ipv4-addr'
      ip: '1.1.1.1'
    }
  }
}

"to be" state with the new field populated

{
  threat:
  {
    indicator: {
      _id: '123',
      name: '1.1.1.1'
      type: 'ipv4-addr'
      ip: '1.1.1.1'
    }
  }
}

for a file type indicator

{
  threat:
  {
    indicator: {
      _id: '123',
      type: 'file'
      file: { hash: {'md5': 'md5_hash', sha256: 'sha256_hash'}}
    }
  }
}

"to be" state with the new field populated

{
  threat:
  {
    indicator: {
      _id: '123', 
      name: 'sha256_hash'
      type: 'file'
      file: { hash: {'md5': 'md5_hash', sha256: 'sha256_hash'}}
    }
  }
}
djptek commented 2 years ago

Hi @maxcold sorry, I gave completely the wrong example there. I intended to give an example for expected_values

expected_values (optional): An array of expected values for the field. Schema consumers can validate integrations and mapped data against the listed values. These values are the recommended convention, but users may also use other values.

From your examples:

I think expected_values might be a good fit

maxcold commented 2 years ago

@djptek no worries, I didn't know about expected_values, thanks for bringing it up! We can definitely provide some example values in the expected_values, but after looking at how expected_values is currently used https://github.com/elastic/ecs/search?q=expected_values I'm not sure if it is a good fit. It seems like it describes the exact values that can appear in a field so that the consumers can build validation for it. In our case, we can only provide example values, which might confuse the schema consumers. But I might be completely wrong about expected_values, happy to provide more example fields if needed

djptek commented 2 years ago

@maxcold expected_values would work if you can provide examples, they doesn't need to be exhaustive and are not intended for validation

This is not the same as allowed_values, which should be validated always

maxcold commented 2 years ago

@djptek got it, thanks! Here is the list of example values for threat.indicato.name field

5.2.75.227
2a02:cf40:add:4002:91f2:a9b2:e09a:6fc6
https://example.com/some/path
example.com
373d34874d7bc89fd4cefa6272ee80bf
b0e914d1bbe19433cc9df64ea1ca07fe77f7b150b511b786e46e007941a62bd7
email@example.com
HKLM\\SOFTWARE\\Microsoft\\Active
13335
00:00:5e:00:53:af
8008

@jamiehynds anything else I can do to help moving it forward?

maxcold commented 2 years ago

hi @jamiehynds , as we are coming closer to the 8.6 release cycle anything we can help with to get this on the roadmap for the release?

maxcold commented 1 year ago

@ebeahan @jamiehynds @djptek hey folks, what can our team do to help move this forward?

ebeahan commented 1 year ago

@maxcold sorry for missing the ping before.

Is this summary still the direction we're taking: https://github.com/elastic/ecs/issues/1998#issuecomment-1225737413?

If so, the next steps would be for someone on your team to open a PR with the requested changes, and we'll review and discuss further as needed.

maxcold commented 1 year ago

@ebeahan yes, that's still the idea. I will add an issue of creating a PR to ECS schema to our backlog then. What about changes in the ti_* integrations? It would be good if these integration populate the field automatically, how do we make this happen?

maxcold commented 1 year ago

@ebeahan btw what is usually the approach for backfilling the data if a new ECS field is introduced (or if there is another change in the schema)? Is there a common way to handle such cases? Specifically for threat.indicator.name it would be good to provide a way for users to add this field to the already existing data. Is it a good idea in general? how should we handle this?

ebeahan commented 1 year ago

@maxcold there's not a common approach I'm familiar with.

Typically when a field is introduced in ECS, the change is made to the data source or integration to start populating that field. New events and indices will have the field, but existing data will not.

maxcold commented 1 year ago

got it, thanks!