Standardize Domain representations

dainperkins commented 4 years ago

Need to double check scope of top level fields, but I think it makes sense to standardize the domain representation between locations (minimally e.g. dns & url, likely client, server etc.) both for the sale of consistent representation, but also for potentially pulling standard features from all domain trackable information.

I think we should consider adding a hostname field as well, within the domain representation (its getting large, but breaking it up seems to be worth the effort)

WIll add a PR shortly

dainperkins commented 4 years ago

Went thru the places where domain names are used and looked at the overall fqdn dissection:

Current ECS DNS Reference:

dns.question.name
dns.question.registered_domain
dns.question.subdomain
dns.question.top_level_domain

It think we should consider adding:

dns.question.hostname (optional as e.g. google.com can be a valid host identifier as well)
dns.question.country_code (optional as, well, its optional)

and then implementing these for each of the following filed sets that are referencing names in some way:

source
destination
client
server
host (explicitly ignoring host.hostname as being the locally configured name as opposed to DNS)
url
dns.answer (for reverse lookups, etc)

The "name" field makes sense in the DNS perspective, but deviates from e.g. URL spec (which references, iirc, the fqdn portion of a URL as HOST) but I think w/in ECS the "NAME" designation would work well, if used across all parent instances.

Which then would also suggest deprecating, or changing:

host.domain*
destination.domain**
server.domain**
source.domain***
client.domain***
url.domain (x)
could be modified to represent the domain as configured on the system - e.g. windows domain, whatever the appropriate way to locally set *nix system's fqdn

** / *** easy enough to carry forward, but I'm wondering about the usefulness & cost of enrichment in an actual ingest/visualization sense...

(x) should probably just be deprecated

@MikePaquette @webmat wdyt?

nemhods commented 4 years ago

@dainperkins thanks for leading me here! Some thoughts on the matter:

Where would the FQDN as a whole go in your scheme? in *.name?
Not to make matters more complicated, but I also liked the idea of source.address as a catch-all field to put hostnames, FQDNs, IPs in if you don't know in advance what your data source provides per event. Maybe that's also a thing to standardize between all the fieldsets.

And finally a rather wild idea, but maybe it leads to something. What about creating an Elasticsearch data type that holds network identities (like the IP datatype on steroids), maybe based on an URI/URL standard. It could accept all variants of identities (IPs, Hostnames, FQDNs, sub- and toplevel-domains, ...) and allow for a normalized representation and flexible search in this data type (similar to how i can already search the IP datatype using CIDR representations).

Another problem is that identities can have aliases, or be represented differently in different contexts - often in a way that can't be normalized and lacking unique identifiers. I'm thinking about an Identity Engine that can hook into the aforementioned "Identity" data type, and be able to find an identity even if it is searched for by a different alias. This would be a great help especially in security use cases, where User and Entity Based Analysis (UEBA) is a powerful concept. The Identity Engine could either learn by itself, or be informed of relations between identities through an API.

Thats way beyond the scope of this issue, and I'm not aware of an even remotely comparable system in the Elastic Stack right now. Just some thoughts while we're at it.

dainperkins commented 4 years ago

So 3 would probably need to go under some core elastic are for a PR, tho definitely not a bad idea.

1) I am of two minds - I'd like to use fqdn, or full_name (as we are supposed to avoid abbreviations) but that would mean changing dns. fields. I'd like to use [x].name to match with dns.question.name, but that may cause issues with host.name being used by SIEM (unsure if using fqgn would be an issue, or would work out ok)

2) As a catch all I think [entity].address could certainly stick around as the spot we populate something while programmatically figuring out what to do with it in e.g. ingest pipeline or similar, so thats probably not an issue...

thanks for taking a look and contributing!

elastic / ecs

Standardize Domain representations #728