elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.91k stars 24.73k forks source link

Elasticsearch should have the ECS component templates in the correct version preinstalled #85146

Open philippkahr opened 2 years ago

philippkahr commented 2 years ago

Description

If you spin up a fresh new Elasticsearch cluster and you want to use your own data, but want to stick to ECS, you need todo the following:

  1. git clone github.com/elastic/ecs
  2. git checkout the ES version.
  3. Then check the README.md inside generated/elasticsearch
  4. Then run this weird command to install the component templates.

Do this every time we update Elasticsearch and ECS.

I believe that Elasticsearch should have the latest version of ECS already in store and only if I need an older version due to index mappings, I would need the route described above.

mark-vieira commented 2 years ago

cc @elastic/ecs

Any thoughts on this? As I understand it those component templates are just meant to be a starting point or example for folks. Would there be any benefit in us "bundling" them into Elasticsearch somehow?

elasticmachine commented 2 years ago

Pinging @elastic/es-data-management (Team:Data Management)

philippkahr commented 2 years ago

hmm @mark-vieira what I see is that e.g. beats create a huge template containing all the ECS mapping, instead of using component templates for everything and only use the custom things inside the beats template.

I think that ECS is the core feature of Elasticsearch as soon as you add custom data integrations, because it allows you to leverage your data inside any Kibana app automatically. You will see your data in the security, observability app. Builtin rules and machine learning jobs only work on ECS.

ebeahan commented 2 years ago

Any thoughts on this? As I understand it those component templates are just meant to be a starting point or example for folks. Would there be any benefit in us "bundling" them into Elasticsearch somehow?

Right, ECS provides the component templates in the repo as examples (not that users can't use the tools in the repo to manage their index templates if they'd like).

Preinstalling may be convenient for advanced users who build out custom index templates. But, besides experienced users, it's unclear how useful the templates would be. Others may see the templates and believe the ECS mapping is already done without taking other necessary steps.

The templates need updating with each release and creates repetitive work for the teams. A component template per ECS field set creates dozens more component templates installed by default in a new install.

what I see is that e.g. beats create a huge template containing all the ECS mapping, instead of using component templates for everything and only use the custom things inside the beats template.

Yes, this is true for Beats. I don't see Beats refactoring to use built-in component templates. Agent solves this by having each integration manage its own mappings.

I think that ECS is the core feature of Elasticsearch as soon as you add custom data integrations because it allows you to leverage your data inside any Kibana app automatically. You will see your data in the security, observability app. Builtin rules and machine learning jobs only work on ECS.

Agreed, and there's some early, ongoing work to improve the custom log onboarding user experience. Helping users map their custom data sources without extensive knowledge of mappings or ingest pipelines is a goal of that effort.

Then run this weird command to install the component templates.

Yes, ECS could provide a script that uses an Elasticsearch client library instead of the bash example.

philippkahr commented 2 years ago

@ebeahan thanks for the insights and comments.

Yes, ECS could provide a script that uses an Elasticsearch client library instead of the bash example.

We could take this onestep further and have it integrated in Elasticsearch / Kibana? So that I as a user can click on "install ECS compatible version templates", select the templates I want and it will be installed?

calve commented 7 months ago

:wave: Our fresh install do come with a ecs@mappings managed component template, that defines some specific fields for but not all. The one described as dynamic mappings based on ECS, installed by x-pack .

This cause confusion since we thought all fields documented at https://www.elastic.co/guide/en/ecs/8.12/ecs-field-reference.html would have been mapped by this upstream template.

As I understand, not providing the component templates mentionned in this issues require all teams reimplement the same work to conform to the ECS specification and ensure they are up to date

I believe advanced users configuring their index template to use managed mapping know what they are trying to do.

As a end user, I would like to have these component managed by upstream and available by default in my Elastic distribution so I dont have to define that http.response.status_code is a long on each and every custom pipeline I write instead of being mapped to keyword resulting in conflicting indices I want to be ECS-compliant. Maybe Im missing something ?

felixbarny commented 6 months ago

There are two different approaches to mapping ECS and each have their tradeoffs.

When exhaustively listing every field that's defined in ECS, there's a lot of maintenance overhead as the template needs to be updated every time a new field gets added. When eagerly adding all ECS fields to an index, this creates lots of unnecessary fields, which results in problems with the field limit and memory overhead in Elasticsearch. An alternative would be to create a dynamic template for each individual ECS field. However, that will impact the performance of dynamic mapping as for each new field, Elasticsearch needs to linearly traverse all dynamic templates to see if one matches.

The approach we took with ecs@mappings is to have a minimal set of dynamic templates, based on naming conventions. It also only contains mappings for fields that differ from the default mappings for the corresponding JSON type (like string -> keyword, number -> long). The tradeoff is that this assumes that fields are sent with the correct type. So if http.response.status_code is supposed to be a long but is sent as a string, ecs@mappings doesn't map http.response.status_code to long and doesn't coerce strings in the source document to a long. It assumes the shippers are using field types that adhere to ECS. Depending on the use case, this may or may not be a reasonable assumption.

@calve do you know why http.response.status_code is sent as a string instead of a number? Are you using an Elastic-provided integration that might need to be fixed? If not, which shipper are you using?

nicpenning commented 6 months ago

I found that destination.port had the same behavior on my custom integration. I thought since I was sending destination.port (even as a string) it would be mapped appropriately with ecs@mappings.

So am I correct in reading here that I need to convert it to a long at the source or via ingest pipelines to properly get the ecs mapping from the template?

felixbarny commented 6 months ago

That's correct

nicpenning commented 6 months ago

That is great information, thank you. I will give that a go!

nicpenning commented 6 months ago

Confirmed, sending my data as an UInt correctly mapped destination.port as a long. Thank you!