Closed yoda-sec closed 4 years ago
@yoda-sec Great topic! Currently indexing strategies, including naming, are beyond the scope of the ECS specification, however I think this is a good discussion, since, as you say, to enjoy re-use and sharing of various analysis content will be impacted by index pattern selection. Let's use this issue for ideas and discussions.
I would reserve index name tweaks for more straightforward situations. For example, Beats does use the beat name and version in index names, so people can work around breaking upgrades when they happen, or grab everything, when all versions in use are aligned.
For ECS, however, the potential amount of indices that follow ECS is too high. Your Beats indices will (soon) be ECS, your Logstash pipelines for various things may follow ECS as well, and finally some partner / third party event streams may also follow it eventually. Managing to get everyone to align correctly on index naming would not really be possible.
So depending on the environment complexity, I see a few ways we can grab ECS data broadly:
_exists_:ecs.version
ecs.version:x.*
when you need for example something you know is only in ECS version X.Y and later.What does "Your Beats indices will (soon) be ECS, your Logstash pipelines for various things may follow ECS as well" refer to? Is that referring to field names or something in the index name tied to ECS?
What about if some type of standard was built around index aliases to provide flexibility and options for how folks like to manage their indices? Looking at process events (since it's fresh in my mind from the other issue :) ), what if ECS said the standard index for sharing content related to endpoint processes was called "sampleindex". All dashboards, watchers, and hopefully other open source tools could then create content around querying this "sampleindex" only (which pushes reusability and sharing in the community).
Anyone who wishes to use shared content would be expected to create an index alias called "sampleindex" that matches whatever their custom Winlogbeat/Sysmon/Carbon Black/etc index naming pattern is for their process data (you could potentially even suggest adding a filtered alias that requires ecs versions to exist or process.*
to exist). Thoughts?
Hey @yoda-sec, sorry for the delay here. Let me address a few of your points. Please hit me back if you have more questions.
Closing as stale. There's no plan to define guidance on index naming.
I spend some time coming up with an index naming schema that we have been using for some time now. The details are specified in https://github.com/geberit/elastic-helpers/blob/master/Naming%20conventions.md#version-2
Any input is welcome.
There's no plan to define guidance on index naming.
Seems it is still happening.
People subscribed to this closed issue might be interested in #980 and https://www.elastic.co/blog/an-introduction-to-the-elastic-data-stream-naming-scheme.
Thanks for the follow-up, @ypid-geberit.
I wanted to clarify that ECS will note these naming guidelines and restrictions for the data_stream.*
fields to align with the new indexing strategy. Still, ECS continues not to have any naming guidance itself. Indexing strategies, including naming, remain out-of-scope for ECS.
The data_stream.*
fields and their naming scheme work in tandem with data streams as part of the new indexing strategy for time series data. Sources adopting this new strategy (such as the Elastic Agent) need to follow the data stream's naming guidelines and restrictions.
I looked through the ECS repo and other open issues and wasn't able to find anything related to index names. Does the ECS standard have any plans to define index naming conventions to make it easier to correlate similar types of data from different data sources? For example, if I am researching user authentication events for "jsmith", I may want to review audit logs from windows, linux, VPN, MFA, O365, etc and would typically want to start with 1 Kibana query or 1 dashboard that gives me information from all those data sources.
Is there any plan to "map" these types of events to a standard "audit" index or at-least to a standard device type index to make it easier to share alerting and visualization resources across the elastic user base?