elastic / ecs

Elastic Common Schema
https://www.elastic.co/what-is/ecs
Apache License 2.0
1k stars 415 forks source link

Standardize Domain representations #728

Open dainperkins opened 4 years ago

dainperkins commented 4 years ago

Need to double check scope of top level fields, but I think it makes sense to standardize the domain representation between locations (minimally e.g. dns & url, likely client, server etc.) both for the sale of consistent representation, but also for potentially pulling standard features from all domain trackable information.

I think we should consider adding a hostname field as well, within the domain representation (its getting large, but breaking it up seems to be worth the effort)

WIll add a PR shortly

dainperkins commented 4 years ago

Went thru the places where domain names are used and looked at the overall fqdn dissection:

Current ECS DNS Reference:

It think we should consider adding:

and then implementing these for each of the following filed sets that are referencing names in some way:

The "name" field makes sense in the DNS perspective, but deviates from e.g. URL spec (which references, iirc, the fqdn portion of a URL as HOST) but I think w/in ECS the "NAME" designation would work well, if used across all parent instances.

Which then would also suggest deprecating, or changing:

** / *** easy enough to carry forward, but I'm wondering about the usefulness & cost of enrichment in an actual ingest/visualization sense...

(x) should probably just be deprecated

@MikePaquette @webmat wdyt?

nemhods commented 4 years ago

@dainperkins thanks for leading me here! Some thoughts on the matter:

And finally a rather wild idea, but maybe it leads to something. What about creating an Elasticsearch data type that holds network identities (like the IP datatype on steroids), maybe based on an URI/URL standard. It could accept all variants of identities (IPs, Hostnames, FQDNs, sub- and toplevel-domains, ...) and allow for a normalized representation and flexible search in this data type (similar to how i can already search the IP datatype using CIDR representations).

Another problem is that identities can have aliases, or be represented differently in different contexts - often in a way that can't be normalized and lacking unique identifiers. I'm thinking about an Identity Engine that can hook into the aforementioned "Identity" data type, and be able to find an identity even if it is searched for by a different alias. This would be a great help especially in security use cases, where User and Entity Based Analysis (UEBA) is a powerful concept. The Identity Engine could either learn by itself, or be informed of relations between identities through an API.

Thats way beyond the scope of this issue, and I'm not aware of an even remotely comparable system in the Elastic Stack right now. Just some thoughts while we're at it.

dainperkins commented 4 years ago

So 3 would probably need to go under some core elastic are for a PR, tho definitely not a bad idea.

1) I am of two minds - I'd like to use fqdn, or full_name (as we are supposed to avoid abbreviations) but that would mean changing dns. fields. I'd like to use [x].name to match with dns.question.name, but that may cause issues with host.name being used by SIEM (unsure if using fqgn would be an issue, or would work out ok)

2) As a catch all I think [entity].address could certainly stick around as the spot we populate something while programmatically figuring out what to do with it in e.g. ingest pipeline or similar, so thats probably not an issue...

thanks for taking a look and contributing!