elastic / package-spec

EPR package specifications
Other
18 stars 72 forks source link

[Discuss] Elastic Packages: Introduce schema for data providers #199

Open mtojek opened 3 years ago

mtojek commented 3 years ago

In general I like the deterministic approach we are following here. There is a concern I have about the number of fields this will add to each dataset and with it increase size of all templates and mapping. This multiplies quickly with many dataset and if we just add k8s fields to all integrations even if someone does not use it, it is not great.

There might be a partial way out here: dynamic mappings. Instead of configuring all fields for k8s in the referenced fields it is a dynamic mapping that makes sure the fields are dynamically mapped correctly, most are keywords anyways so likely do not even need the mapping to be set as this is the default.

One completely different alternative is to use more recent feature in Elasticsearch that the mapping can be sent as part of the request. Like this the creation of these mappings would be delegated to Beats as part of the pull request. But it would have to be investigated if this causes issues with the permissions.

PS: I don't like that we keep mixing two discussions into a single issue. It keeps creating confusion. We should close this issue and have a separate one for the "current" discussion.

Originally posted by @ruflin in https://github.com/elastic/package-spec/issues/63#issuecomment-876970648

mtojek commented 3 years ago

@ruflin I think the same idea about dynamic templates was brought by @jsoriano in https://github.com/elastic/package-spec/issues/63#issuecomment-867435522 .

I'm afraid we don't have power to decide if we can go that way. Probably it isn't easy to introduce in a single iteration, maybe you can suggest somebody from the agent team who can explore the idea (if it's doable)?

In general I like the deterministic approach we are following here. There is a concern I have about the number of fields this will add to each dataset and with it increase size of all templates and mapping. This multiplies quickly with many dataset and if we just add k8s fields to all integrations even if someone does not use it, it is not great.

Do you think we should pair with the Fleet team to consider a more flexible solution (user can select enabled data providers, e.g. kubernetes runtime)?

Side note:

It seems to be a neverending thread and there are many objectives. I would be great to decide if we want to solve this problem now or should we focus on something else.

ruflin commented 3 years ago

The part I would like to understand is: What breaks if we don't handle it right now? The reason I ask this is because I think the default mappings will cover most providers pretty well.

I'm good with getting a short term solution but this short term solution must ensure we don't get ourself into a position that we now map just everything and are back to the problem of too many fields.

mtojek commented 3 years ago

Frankly speaking I like the approach with default mappings more if it's on the roadmap. This way we won't pollute indices with useless mappings (e.g. cloud fields in non-cloud environments).

We have to remember that it impacts developer experience, developer/devops:

If there are technical difficulties or gaps on the agent/fleet side, I'm good with postponing this later (rather not for never), but we need to take this decision cautiously.

cc @masci @andresrc @jsoriano

exekias commented 3 years ago

One thing that is not covered by default mappings is defining the meta for some of these fields. I'm specifically thinking about dimensions, that we need to flag in some cases. We could put these fields in the ES default mappings but I'm wondering if that's a good practice, as it ties the fields to the ES version.

jsoriano commented 3 years ago

default mappings

When talking about default mappings to what mappings do you refer to?

What breaks if we don't handle it right now?

Regarding this question, the main part we are lacking now is the mapping for fields added by processors or autodiscover. In principle, I think that currently this would only affect standalone agent, as is the only way to include dynamic inputs or additional processors at the moment. With the current focus on the Fleet experience this may be less prioritary, but still something we need to improve for important use cases.

Also some mappings don't break now only because in some modules we are already including the mapping of many "data provider" fields, or other common fields. But this situation is not ideal (it requires bulk changes when something changes in these fields, and many fields are included in mappings even if they are never used in most deployments).

mtojek commented 3 years ago

When talking about default mappings to what mappings do you refer to?

I was thinking about dynamic mappings using the ES feature you mentioned. By "default" I meant hardcoded (mapping with default type) somewhere in Kibana, Agent, etc., but not in integration.