Open WiegerElastic opened 7 months ago
Pinging @elastic/fleet (Team:Fleet)
@WiegerElastic Various upgrade information is reported to upgrade_details
field in agent documents that can be queried against:
This badge in the UI is derived from upgrade_details.state
field in particular, which can be one of:
UPG_REQUESTED
UPG_SCHEDULED
UPG_DOWNLOADING
UPG_EXTRACTING
UPG_REPLACING
UPG_RESTARTING
UPG_WATCHING
UPG_ROLLBACK
UPG_FAILED
For finding agents that failed to upgrade, filtering against upgrade_details.metadata.error_msg
or upgrade_details.state: 'UPG_FAILED'
may be more fruitful.
In addition, triggering upgrade(s) will log to agent activity and this UX path surfaces agents with issues (screenshots borrowed from https://github.com/elastic/kibana/issues/183243#issuecomment-2111927572):
@kpollich @nimarezainia I'm hesitant to surface another filter to our already-busy filter bar for agents table just for upgrade status. WDTY, given the guided UX and queryable fields that we already have?
I agree. The drop down filters are for the very common activities. The KQL there is available for custom filters. @WiegerElastic would that work for you?
I'm going to return this to the backlog since it sounds like we'd prefer to deprioritize or rethink this.
Describe the feature: Recently, we have started adding the status of upgrades to the Fleet ui (e.g. upgrade monitoring, upgrade failed, upgrade stalled, etc). It would be really helpful if I could actually filter on these fields so I can quicker find out which machines aren't upgrading successfully.
The health field isn't always useful here since an Agent might be happy and healthy (because it's connected to Fleet and receiving policies updates, etc) but still fail to upgrade (because a download was malformed, the upgrade stalled, etc).
Describe a specific use case for the feature:
Being able to filter on these values helps operators to quicker identify problematic machines. I've highlighted some of the fields that I would like to filter on. I couldn't find an example of a healthy Agent that failed to upgrade.