elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.61k stars 8.22k forks source link

[Fleet] Automatically disable inputs with required variables during non-UI policy upgrades #190790

Open jamiehynds opened 1 year ago

jamiehynds commented 1 year ago

Per @andrewkroh below:

I noticed this problem in Fleet while working directly with the API. Any time a package adds a new data stream that is enabled by default AND it contains required variables, then upgrading effectively becomes a breaking change. Users would need to modify their code that uses the API to explicitly disable the new data streams with required variables.

For example, someone practicing IaC might create a package policy request that specifies only the data stream they are interested in using. When the upgrade to a new minor release they don't expect to need to modify their request to disable new data streams.

This is almost exactly the same issue we had with Filebeat modules at https://github.com/elastic/beats/issues/17256.

When upgrading a package policy through the Fleet API, if the new version of a given package introduces new inputs/data streams with required variables, the operation cannot be performed without the user modifying their request. Fleet should support API requests where the "previous" version of the package policy is acceptable, and the new breaking data streams will be automatically disabled to prevent the need for the user to modify their request/package policy.

Fleet should return some kind of warning or metadata to inform the user that this happened.

Previous description Our integrations currently have all inputs and data streams within the integration enabled by default. Users have to manually disable any unwanted inputs/data streams, and in most cases we've seen, users don't disable them. We've seen this result in some support cases and SDH's where too many data streams and enabled, or an integration is listening for syslog on both UDP and TCP, when only one is generally needed. One example is Cloudflare, where we support events from HTTP Endpoint, AWS S3 and Google Cloud Storage. In the vast majority of cases, users only ingest data from one of those inputs, yet we have them all enabled by default. Within each of those inputs, there are 7 data streams (with more on the way) which users have to manually disable any unwanted datasets. Screenshot 2023-05-05 at 10 36 23 **Proposal** - Fleet adding the ability for integration developers to disable all of these toggles by default, allowing users to opt-in to which inputs and data streams they could like to use within each integration. - Include an override for cases where we'd like to have a data stream enabled by default - Ensure we don't break any integrations already enabled by users and upgrades won't disable any enabled data streams.
elasticmachine commented 1 year ago

Pinging @elastic/fleet (Team:Fleet)

jamiehynds commented 1 year ago

@jsoriano does it make sense to create a similar issue in package-spec too or is it ok to discuss within this issue for the time being? It's something @P1llus have discussed for awhile and have seen several customers impacted by all data streams being enabled by default.

jsoriano commented 1 year ago

Hey @jamiehynds,

There is already a toggle to control what data streams are enabled by default, but it works the opposite. All data streams are enabled by default, and package developers can set enabled: false to disable them.

For example for the Kubernetes package, the controllermanager and scheduler data streams are disabled with enabled: false on their manifests: https://github.com/elastic/integrations/blob/3e0894f421c49e1a79e30f8b95d64dbbd6330af4/packages/kubernetes/data_stream/controllermanager/manifest.yml#L5 https://github.com/elastic/integrations/blob/3e0894f421c49e1a79e30f8b95d64dbbd6330af4/packages/kubernetes/data_stream/scheduler/manifest.yml#L5

And they appear as disabled by default:

imagen

Would this be enough for your use cases?

jamiehynds commented 1 year ago

Thanks @jsoriano - didn't realise that toggle exists. For integrations that users have deployed, if we set enabled:false, I'm guessing that will disable any data streams currently in use and a user would have to manually intervene to re-enable the data stream? Or is there a way to avoid impact for integrations currently in use?

andrewkroh commented 1 year ago

I noticed this problem in Fleet while working directly with the API. Any time a package adds a new data stream that is enabled by default AND it contains required variables, then upgrading effectively becomes a breaking change. Users would need to modify their code that uses the API to explicitly disable the new data streams with required variables.

For example, someone practicing IaC might create a package policy request that specifies only the data stream they are interested in using. When the upgrade to a new minor release they don't expect to need to modify their request to disable new data streams.

This is almost exactly the same issue we had with Filebeat modules at https://github.com/elastic/beats/issues/17256.

jsoriano commented 1 year ago

Any time a package adds a new data stream that is enabled by default AND it contains required variables, then upgrading effectively becomes a breaking change.

I guess this could be handled by Fleet, when it upgrades a package, it could check if there are new data streams and disable them on unattended installations, or show a dialog to configure them on upgrades through UI. @kpollich @juliaElastic wdyt?

An alternative could be to disallow adding data streams enabled by default to GA packages, but this can be cumbersome for package developers.

botelastic[bot] commented 6 months ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

jsoriano commented 2 months ago

Bumping-up this issue.

@kpollich as it looks like the pending problem here is with package upgrades, wdyt about moving it to Kibana repo?

cc @krol3

kpollich commented 2 months ago

I updated the title + description and transferred this to Kibana to capture the new ask around upgrades and breaking changes. I'm putting this into an 8.17 sprint for the time being.

nchaulet commented 2 months ago

For example, someone practicing IaC might create a package policy request that specifies only the data stream they are interested in using. When the upgrade to a new minor release they don't expect to need to modify their request to disable new data streams.

Should the behavior be the same when creating a package? and should we add flag when creating/upgrading packages to not enable streams by default? and only enable those defined in the request.

For IAC tools it will be weird otherwise, you will have a different behaviour when upgrading a package, then when you recreate your infrastructure from scratch.

lucabelluccini commented 1 month ago

+1

Ideally when upgrading version:

In general (even for new policies):

nbarnes22 commented 1 month ago

This also occurs in the UI. When I upgrade the Elastic cloud infrastructure, I also update all of my integrations and policies. The integrations should not be modified after I have configured it. In my experience, I went to edit an integration with only S3 access logging enabled but instead there were an additional two parameters enabled which I did not explicitly enable. The fix was that I had to go through all of my integrations and manually disable the parameters that were automatically enabled by some upgrade. I think when a new feature is added, it is automatically enabled by default across all integrations. This seems like a bad idea, and the manual fix is troublesome. Now I have to manually check every integration after every upgrade to ensure the settings stay how I need them.

leandrojmp commented 3 weeks ago

Is there any update on this?

I have some cloudtrail and cloudwatch integrations in multiple policies and after updating the AWS integration I had to manually edit all the integrations to disable the AWS Health metrics (experimental) that was added, otherwise the agents would be unhealthy.

Also, the title mention non-UI, but this happens when you update an integration from the UI as well.