elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.69k stars 8.24k forks source link

[Fleet] Automatic Agent Upgrade option in Agent Policy #120735

Open aarju opened 2 years ago

aarju commented 2 years ago

Describe the feature: Within an Agent Policy it would be nice if there was an 'automatically upgrade' option to automatically upgrade any agents that connect with that policy to the most recent version.

Describe a specific use case for the feature: When Deploying agents users will use administrative scripts and tools such as Intune, Jamf, Active Directory, etc. When upgrading to a new stack version the users will need to update all of their deployment scripts after the upgrade to use the newest versions. In larger organizations this update process could take several weeks to complete. During this time the Fleet administrator will need to regularly check in and select all agents that need updated and manually update them. With this option set the older agents would connect and immediately get upgraded to the newest version without requiring user interaction.

Agent Policy change mock up

Image

elasticmachine commented 2 years ago

Pinging @elastic/fleet (Team:Fleet)

jen-huang commented 2 years ago

cc @mostlyjason for awareness

kpollich commented 6 months ago

Assigning to @nimarezainia for product refinement and brainstorming

nimarezainia commented 5 months ago

Assigning to @nimarezainia for product refinement and brainstorming

@kpollich added a mock up to the description. I believe the checkin payload does have the agent version embedded so Fleet could initiate the action to upgrade.

The other option would be to modify the agent policy to signify the version agents are ought to be at. Which would trigger the agent to upgrade if needed. It's a bit more declarative. However this approach may need more of an uplift on our upgrade process which may introduce risk a this stage. (cc: @cmacknz )

nimarezainia commented 5 months ago

https://github.com/elastic/ingest-dev/issues/2878

cmacknz commented 4 months ago

When we do this, it needs to be done as a gradual upgrade with checks to abort or pause the upgrade after alerting the user if problems were encountered (agents rolling back or going offline). I've moved this back to needs tech def so we can define exactly how this should work.

This should not mass update every agent in the policy immediately by default.

nimarezainia commented 4 months ago

@cmacknz I agree with that approach.

aarju commented 4 months ago

This should not mass update every agent in the policy immediately by default.

I think some built in 'risk levels' settings would be nice to have that can be configured at the policy level for automatic upgrades. For example:

nimarezainia commented 3 months ago

We will address this use case by developing the suggestions in this issue, which moves us more towards having a native canary deployment model.

richlv commented 2 months ago

Heya, the linked issue results in 404 - how could that be fixed?

nimarezainia commented 2 months ago

Heya, the linked issue results in 404 - how could that be fixed?

sorry was linking to a private repo. I am reopening this until the private issue is resolved.