elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
124 stars 134 forks source link

[Fleet]: After cancelling the Request for Schedule Upgrade, Upgrade scheduled label is not removed from the Agents tab. #4293

Open harshitgupta-qasource opened 7 months ago

harshitgupta-qasource commented 7 months ago

Kibana Build details:

VERSION: 8.12.1
BUILD: 70228
COMMIT: 3457f326b763887d154c9da00bd4e489221a2ff3

Host OS and Browser version: All, All

Preconditions:

  1. 8.12.1 Kibana Cloud environment should be available.
  2. Policy should be created.
  3. 8.12.1 agent should be deployed
  4. Endpoint security should be added to policy.

Steps to reproduce:

  1. Navigate to Fleet Tab and select the Agent
  2. Schedule upgrade using API for 8.12.1 agent to 8.12.2.
  3. Now cancel the schedule upgrade from agent acitvity.
  4. Observe that "Upgrade scheduled" label is still visible.

Expected: After cancelling the Request for Schedule Upgrade, Upgrade scheduled label should be removed from the Agents tab.

Screencast:

https://github.com/elastic/kibana/assets/101545338/da8f9b8e-64d7-4d57-9618-01c0bda2d80b

elasticmachine commented 7 months ago

Pinging @elastic/fleet (Team:Fleet)

harshitgupta-qasource commented 7 months ago

@amolnater-qasource Kindly review

amolnater-qasource commented 7 months ago

Secondary Review for this ticket is Done.

jlind23 commented 7 months ago

@amolnater-qasource @harshitgupta-qasource is this a new bug you found in 8.12 or something that was already existing?

harshitgupta-qasource commented 7 months ago

Hi @jlind23

While testing the https://github.com/elastic/kibana/issues/168502 feature on 8.12.1 and then attempting to upgrade the agent via scheduled upgrade to 8.12.2, we have discovered this issue.

juliaElastic commented 7 months ago

Should this issue be moved to elastic-agent repo? I didn't find any logic in fleet-server regarding cancel action. Though I found that kibana itself updates agent docs to clear the Updating state here, so probably we can clear upgrade_details here too. Can we confirm that cancelled action is not executed at the scheduled time? (to confirm if the bug doesn't impact agent).

Tested this, and I can confirm that the agent cancelled the action, so the bug is that the agent is stuck in upgrade scheduled state, but doesn't upgrade.

14:58:03.876
elastic_agent
[elastic_agent][info] Cancel action id: 551f9e9c-9de0-498f-81ae-da36046569ab target id: f701dd74-49c5-4a6c-a241-ea403ddb1589 removed 1 action(s) from queue.

cc @cmacknz @kpollich

juliaElastic commented 7 months ago

I tried to fix locally by clearing the upgrade_details of the agent doc in kibana when the cancel API is called. It doesn't seem to work, because the upgrade_details with scheduled state keeps coming back on every checkin. So I think this has to be fixed on agent side.

image
cmacknz commented 7 months ago

The Cancel action id: seen above are coming from https://github.com/elastic/elastic-agent/blob/bdd885c9df3eb0c81624f852c74c64fc250a1b17/internal/pkg/agent/application/actions/handlers/handler_action_cancel.go#L34-L44

The action queue is registered as the canceller in https://github.com/elastic/elastic-agent/blob/bdd885c9df3eb0c81624f852c74c64fc250a1b17/internal/pkg/agent/application/managed_mode.go#L380-L387

The actual cancel implementation is in https://github.com/elastic/elastic-agent/blob/bdd885c9df3eb0c81624f852c74c64fc250a1b17/internal/pkg/queue/actionqueue.go#L136-L149

There is no notification that an upgrade is cancelled when it is removed from the queue like this, so I think we will stay in the upgrade scheduled state until an upgrade is eventually completed by a separate action. We likely need to update this cancel implementation to have some special handling for upgrade actions, specifically it needs to clear the upgrade details.

The upgrade is marked as scheduled in https://github.com/elastic/elastic-agent/blob/bdd885c9df3eb0c81624f852c74c64fc250a1b17/internal/pkg/agent/application/dispatcher/dispatcher.go#L358-L361

elasticmachine commented 4 months ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)