[Fleet] Updating a package when a cluster have a large list of pending tasks is problematic

nchaulet commented 2 years ago

Description

Updating a package when a cluster have a large list of pending tasks could be really problematic.

It could happen during Fleet setup that we trigger a rollover for existing data stream if we are not able to update them in place (incompatible mappings for example, ...)

In this case Fleet will make things worse for the cluster state and will not be able to install the package correctly.

How should we handle that?

elasticmachine commented 2 years ago

Pinging @elastic/fleet (Team:Fleet)

joshdover commented 2 years ago

I think we need to be smarter about which types of exceptions we catch and decide to attempt a rollover. Today we just try to execute a rollover if any exception happens during the putMappings call: https://github.com/elastic/kibana/blob/314ae9a9617d40263345b8e2ee3b3c99cb6c2a2d/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/template.ts/#L528-L540

Instead, I think we should only attempt a rollover on an incompatible mappings exception and otherwise we should re-throw the exception, failing the package installation. This should cause Fleet to reinstall the previous package version, which shouldn't fail for the same reason because the put mappings call should be a no-op that wouldn't require a cluster state update.

elastic / kibana

[Fleet] Updating a package when a cluster have a large list of pending tasks is problematic #122700

Description