Open MakoWish opened 1 year ago
I had the same impression when I started to planning the migration from normal Indices and custom pipelines to the Elastic Agent Integrations.
The integrations helps you to add new data without the need to create pipelines to parse the messages and everything else, but it also makes the administration and managing of these data way hard and confusing.
For example, there is only one lifecycle policy for all the integrations, if you have a low volume rate integration and a very high volume rate integration, they will use the same lifecycle policy, of course you can change it but you would need to edit tens of templates to achieve that.
This was one of the many reasons that made us drop the adoption of Elastic Agent and use it just for some simple things.
Yeah, after about three months of playing around with it, we came to the conclusion just yesterday that Elastic Agent and Data Streams are just not ready for mainstream yet. We will be reverting our Control Group devices back to Beats agents. This is unfortunate, as I was really excited about being able to centrally manage agent deployments. Beats and indices with Component Templates are easy. Elastic Agent with convoluted Data Streams, along with the impossible-to-manage Index Templates and Ingest Pipelines, are not something I want to deal with.
Eric
Part of the reason component templates were developed is the data stream naming scheme. It was not possible to combine templates the way we wanted without having "accidental inheritance". Elastic Agent and Fleet fully rely on component templates and these are not going aways, rather we start using it more and more.
We need to split up 2 things:
Fleet has a specific way how to manage component templates and for the integrations it installs, it follows the internal conventions. But most of these component templates are not meant for users to modify, unfortunately Elasticsearch has no way to specify "managed" templates. These templates have an extension point which is defined as @custom
. This is all Fleet specific.
Why was this decision made, and can we make a hard push to get back to the original idea behind Component Templates?
@MakoWish You can use the way component templates work best for you, it is not that we are moving away from component templates in any way. There is for example currently a discussion ongoing around ECS to offer it as a component template. Should it 1 ECS component template or should it be 1 per prefix, or 1 per core and 1 for extended. There are pros and cons for each.
For example, there is only one lifecycle policy for all the integrations, if you have a low volume rate integration and a very high volume rate integration, they will use the same lifecycle policy, of course you can change it but you would need to edit tens of templates to achieve that.
@leandrojmp Unfortunately you are right. The way it is setup is not ideal and we are working on improving/fixing it. The part that changed is now as you have multiple data streams for the different datasets, you can at least have ILM policies for the different data. Component templates will help us to solve this problem. Previously in beats, you had one big index with all the data inside if you didn't specify your own indexing strategy.
@ruflin
You can use the way component templates work best for you
No, not really. The Index Templates for Elastic Agent Integrations are managed, so if I were to change them to use the previous Component Templates we have relied on for a while now, they will just be overwritten on the next Integration update. I already tried.
it is not that we are moving away from component templates in any way.
But you have. Instead of an Index Template using shared Component Templates as I understand they were meant to be used like this:
...
"composed_of": {
"host",
"client",
"source",
"destination",
...
}
You no longer use those shared Component Templates and have created completely disparate Component Templates like this:
...
"composed_of": {
"integration@package",
"integration@custom"
}
IMHO, this completely breaks the entire idea behind shared Component Templates.
Previously in beats, you had one big index with all the data inside if you didn't specify your own indexing strategy.
We are using our own indexing strategy, but Elastic Agent would force us away from that, and we would have to re-architecture pretty much everything we have done over the past four years. For instance, we broke each Filebeat and Metricbeat module out to their own indices, so instead of everything going to filebeat-*
, we have Filebeat's Cisco moduless writing to our cisco-*
indices; Metricbeat's IIS module writes to iis-*
; Filebeat's Threat Intel module writes to threatintel-*
; and so on and so forth. Now Elastic Agent wants to write everything to logs-*
without any apparent way to change that. Every dashboard, visualization, saved search, security detection rule, Watcher... everything would need to be rebuilt to use Elastic Agent.
No, not really. The Index Templates for Elastic Agent Integrations are managed, so if I were to change them to use the previous Component Templates we have relied on for a while now, they will just be overwritten on the next Integration update. I already tried.
If you are using the integration packages, yes you are forced to use the template structure we have put in place and use @custom
. This is by design so we can upgrade packages without breaking your setup. If you are using your own data streams, you have the complete freedom to put together index templates and component templates the way you want. What you mean in this context "previous component templates"?
Instead of an Index Template using shared Component Templates as I understand they were meant to be used
What you describe is a valid use case for component templates. But this doesn't mean the way we use component templates in Fleet / Elastic Agent is invalid. I can't remember that we ever used component templates for Fleet / Agent the way you describe it above. There are reasons for the way we use it which are related to allow users to overwrite settings or mappings to a certain point which would not be possible without component templates. As described before, there is a good chance in some parts of the integrations for ECS we also start to use the component templates as reusable parts which I hope you will like as it seems the top level objects you described, sound a lot like ECS.
The data stream naming scheme is inspired exactly by what you did with your indices and many others. Elastic Agent forces you to use the data stream naming scheme but it does not force you in any way to use component templates in the way we do for integrations. You can specify your components, your index templates etc. as long as you use logs-*-*
. So in your scenario, this would likely be logs-cisco-default
, logs-threatintel-default
etc. Happy to chat more about migration from Beats to Elastic Agent but I don't think this is the right place as it is not related to component templates.
If you are using the integration packages, yes you are forced to use the template structure we have put in place and use @custom.
That is the main issue, right there. We previously had a single component template for each ECS (or custom) set of fields. If there was a change to one of these Component Templates, it was automatically applied to every Index Template that uses it. Now if we make a change to the user
set of fields, for instance, we will have hundreds of @custom
Component Templates to update.
Just as an arbitrary example, we enrich quite a lot of our events, regardless of the source, to add more information about users related to the events. It does not matter if the event comes from our firewall data, anti-virus, Beats agents, or any other source, we enrich them all with custom ECS-compliant fields such as user.department
, user.description
, and user.manager
. We can no longer just update the single shared user
Component Template. We would now need to update every single @custom
template to include those custom ECS-compliant fields.
What you mean in this context "previous component templates"?
I only mean the Component Templates that are shared by other Index Templates such as host
, user
, source
, destination
, etc. as opposed to this new @custom
idea.
The main issue in my opinion is that if you need to make any custom change on any integration, be it a custom mapping for a custom field or a custom ingest pipeline, you wil have so much work that in the end it will make you avoid to use the integrations at all.
For example, I had a recent issue on discuss while trying to add a custom ingest pipeline to an integration to add a custom field, source.ip
, an ecs one.
Following the documentation I saw that i just needed to create a ingest pipeline named logs-integration.dataset@custom
, this worked, but them the field source.ip
was not present in the mapping for this dataset and I got a conflict message from kibana, another dataset in the same integration had a different mapping, so to add a simple custom field I need to edit a custom ingest pipeline and a custom template, if I want to add a custom field to this integration I would need to edit at least 5 custom ingest pipelines and 5 custom component templates.
It would be better if the integrations had a simple way to add a component templates and ingest pipelines to all its datasets, without the need to edit so many files.
But I agree that this is not the right place to chat about it as this is not an issue with component templates, but with the integrations.
Here is a perfect example of the issue at hand. I went to create a new Data View for the Indices and Data Streams winlogbeat*,logs-system*,logs-windows*
, and there is a conflict across the Component Templates with the Data Steams.
All my winlogbeat-*
indices have source.geo.location
mapped as geo_point
, as they are using our shared source
Component Template. The Data Streams logs-system.auth-default
and logs-system.security-default
also have source.geo.location
properly set as geo_point
. Unfortunately, the @package
Component Template for logs-windows.sysmon_operational
only has mappings for source.port
, source.domain
, and source.ip
fields, so source.geo.location
was incorrectly created as object
. This could have been avoided if the Index Templates used shared Component Templates.
To my understanding, there is no way to reindex a Data Stream's backing indices, so even if I add the source
mappings to @custom
, there is no way for me to reindex the affected indices to correct the conflict.
EDIT:
Also just found metrics-linux.socket
is missing the source.geo.location
and source.ip
mappings as well.
And system.process.cpu.system.time.ms
should be long
, but it also has different mappings throughout as well:
Type: date
.ds-metrics-elastic_agent.filebeat-default-2022.11.07-000001, .ds-metrics-elastic_agent.fleet_server-default-2022.11.17-000001, .ds-metrics-elastic_agent.metricbeat-default-2022.11.07-000001
Type: long
.ds-metrics-elastic_agent.elastic_agent-default-2022.11.07-000001
@leandrojmp @MakoWish The points you are bringing up are valid and we should find ways to accommodate this kind of usage. I think we initially got distracted by the title "Moving Away from Shared Component Templates Already?" but the way I read it now, it is much more about how integrations templates / ingest pipelines can be extended with your own component templates / ingest pipelines.
To continue the conversation, I suggest we take this to the Kibana repository as Fleet is part of Kibana and is the tool that installs the templates. @leandrojmp Any chance you could put the details you have in https://github.com/elastic/elasticsearch/issues/91370#issuecomment-1331445558 into a Github issue under https://github.com/elastic/kibana and reference it here? Then @MakoWish can join in with his details.
@MakoWish For the "wrong" mappings, this ECS discussion should help with it eventually: https://github.com/elastic/elasticsearch/issues/85692 The discussion was stale for a bit but it will continue shortly.
I think we initially got distracted by the title "Moving Away from Shared Component Templates Already?" but the way I read it now, it is much more about how integrations templates / ingest pipelines can be extended with your own component templates / ingest pipelines.
No, it is about the use of shared component templates throughout to ensure all data sources have the same mappings. If you want to retain the @custom
Component Template idea to allow users to add their own mappings, that is fine, but the @package
Component Template idea is where the flaw lies.
@ruflin I will create a new issue in the Kibana repository with a feature request to make it easier to use custom templates and ingest pipelines with integrations.
I thought the idea of Component Templates was fantastic. Managing dozens (or hundreds) of desparate
_template
's in the past was a royal pain. If there was a change to, let's say, thehost
fields, I would have to go through and modify every single_template
that used the host fields. The creation of_index_template
's being composed of_component_template
really simplified that. All I would have to do is update thehost
component template, and all Index Templates using it would also be updated automatically.I am now trying in our DEV environment to migrate away from indices to start using Elastic Agent and Data Streams, but Elastic has once again gone back to defining fields distinctly in every single Index Template. For instance, if you look at the Index Template
logs-system.security
, instead of being composed ofhost
,process
,source
,destination
, etc. Component Templates, it is composed oflogs-system.security@package
andlogs-system.security@custom
, where each one of those explicitly define all the components. This is completely counter-intuitive to the idea of using Component Templates to begin with.Why was this decision made, and can we make a hard push to get back to the original idea behind Component Templates? We use several custom fields throughout many data sources, like
host.bios.*
,user.target.*
, and many others, and this sudden move away from the new Component Templates is going to make my life a nightmare.Eric