Open harshitgupta-qasource opened 7 months ago
Pinging @elastic/fleet (Team:Fleet)
@amolnater-qasource Kindly review
Secondary Review for this ticket is Done.
@amolnater-qasource @harshitgupta-qasource is this a new bug you found in 8.12 or something that was already existing?
Hi @jlind23
While testing the https://github.com/elastic/kibana/issues/168502 feature on 8.12.1 and then attempting to upgrade the agent via scheduled upgrade to 8.12.2, we have discovered this issue.
Should this issue be moved to elastic-agent repo? I didn't find any logic in fleet-server regarding cancel action.
Though I found that kibana itself updates agent docs to clear the Updating
state here, so probably we can clear upgrade_details
here too.
Can we confirm that cancelled action is not executed at the scheduled time? (to confirm if the bug doesn't impact agent).
Tested this, and I can confirm that the agent cancelled the action, so the bug is that the agent is stuck in upgrade scheduled
state, but doesn't upgrade.
14:58:03.876
elastic_agent
[elastic_agent][info] Cancel action id: 551f9e9c-9de0-498f-81ae-da36046569ab target id: f701dd74-49c5-4a6c-a241-ea403ddb1589 removed 1 action(s) from queue.
cc @cmacknz @kpollich
I tried to fix locally by clearing the upgrade_details
of the agent doc in kibana when the cancel API is called. It doesn't seem to work, because the upgrade_details
with scheduled state keeps coming back on every checkin. So I think this has to be fixed on agent side.
The Cancel action id:
seen above are coming from https://github.com/elastic/elastic-agent/blob/bdd885c9df3eb0c81624f852c74c64fc250a1b17/internal/pkg/agent/application/actions/handlers/handler_action_cancel.go#L34-L44
The action queue is registered as the canceller in https://github.com/elastic/elastic-agent/blob/bdd885c9df3eb0c81624f852c74c64fc250a1b17/internal/pkg/agent/application/managed_mode.go#L380-L387
The actual cancel implementation is in https://github.com/elastic/elastic-agent/blob/bdd885c9df3eb0c81624f852c74c64fc250a1b17/internal/pkg/queue/actionqueue.go#L136-L149
There is no notification that an upgrade is cancelled when it is removed from the queue like this, so I think we will stay in the upgrade scheduled state until an upgrade is eventually completed by a separate action. We likely need to update this cancel implementation to have some special handling for upgrade actions, specifically it needs to clear the upgrade details.
The upgrade is marked as scheduled in https://github.com/elastic/elastic-agent/blob/bdd885c9df3eb0c81624f852c74c64fc250a1b17/internal/pkg/agent/application/dispatcher/dispatcher.go#L358-L361
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
Kibana Build details:
Host OS and Browser version: All, All
Preconditions:
Steps to reproduce:
Expected: After cancelling the Request for Schedule Upgrade, Upgrade scheduled label should be removed from the Agents tab.
Screencast:
https://github.com/elastic/kibana/assets/101545338/da8f9b8e-64d7-4d57-9618-01c0bda2d80b