elastic / fleet-server

The Fleet server allows managing a fleet of Elastic Agents.
Other
81 stars 80 forks source link

[Self-Managed]: Secondary agent remains Healthy even after unenrolling existing fleet-server. #2370

Open amolnater-qasource opened 1 year ago

amolnater-qasource commented 1 year ago

Kibana version: 8.7 BC3 Kibana self-managed environment

Host OS: Windows Fleet-Server and Ubuntu Secondary agent.

Build details: VERSION: 8.7 BC3 BUILD: 60783 COMMIT: 4ed6814a5dd5532a30bde9b34354a949d9166ae9 Artifact Link: https://staging.elastic.co/8.7.0-6a78ce9a/summary-8.7.0.html

Preconditions:

  1. 8.7 BC3 Kibana self-managed environment should be available.
  2. Fleet Server should be installed.
  3. Secondary agent should be installed and running Healthy.

Steps to reproduce:

  1. Navigate to Fleet>Agents tab.
  2. Force Unenroll the existing fleet-server from Fleet>Agents UI.
  3. Observe installed secondary agent remains Healthy.
  4. Observe new data and logs continues to generate.

Note:

Logs:

Uninstalled Fleet Server logs: [Uninstalled Fleet-Server]elastic-agent-diagnostics-2023-02-21T04-36-10Z-00.zip

Secondary Agent running Healthy without fleet-server logs: [Secondary]elastic-agent-diagnostics-2023-02-21T04-41-51Z-00.zip

Expected Result: As per our understanding, secondary agent should get offline and no new data should be generated after unenrolling existing fleet-server.

Screen Recording:

https://user-images.githubusercontent.com/77374876/220252337-83fa0db3-657c-4ef2-b327-9a9fba121f0b.mp4

https://user-images.githubusercontent.com/77374876/220252364-d8a5db4b-f4fb-4c10-9b7c-90b56ab968a8.mp4

https://user-images.githubusercontent.com/77374876/220252405-e22bb1c2-8edc-4697-a8d9-78c4ff70455a.mp4

On running Uninstall command for fleet-server:

https://user-images.githubusercontent.com/77374876/220254916-8543d35f-2e09-4b89-adb4-0159f9047c4e.mp4

amolnater-qasource commented 1 year ago

@manishgupta-qasource Please review.

manishgupta-qasource commented 1 year ago

Secondary review for this ticket is Done

dikshachauhan-qasource commented 1 year ago

Bug Conversion

Thanks!

juliaElastic commented 1 year ago

This seems like a case of Fleet Server keeps running after Elastic Agent uninstall. @michel-laterman @cmacknz Do you know if there is any logic in agent that is supposed to stop Fleet Server on uninstall?

cmacknz commented 1 year ago

This seems like a case of Fleet Server keeps running after Elastic Agent uninstall

Doing a force unenroll only revokes the Fleet Server's Elasticsearch API key, it doesn't stop it from running. Doing a non-forced unenroll would leave the Elastic Agent running but with an empty policy, removing the Fleet Server. This is the same behavior when a regular agent is force unenrolled, the Fleet and ES API keys are revoked.

The secondary agent here seems to think it is still connected to Fleet.

The Fleet Server that was force unenrolled here is just infinitely retrying to connect to ES with an unauthorized error as expected.

{"log.level":"error","@timestamp":"2023-02-21T04:30:17.004Z","message":"Failed to connect to backoff(elasticsearch(https://172.31.25.137:9200)): 401 Unauthorized: {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: api key [_QEscIYBmhrIBYSI9LmJ] has been invalidated\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"unable to authenticate with provided credentials and anonymous access is not allowed for this request\",\"additional_unsuccessful_credentials\":\"API key: api key [_QEscIYBmhrIBYSI9LmJ] has been invalidated\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}},\"status\":401}","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log":{"source":"beat/metrics-monitoring"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":150,"file.name":"pipeline/client_worker.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-02-21T04:30:17.004Z","message":"Attempting to reconnect to backoff(elasticsearch(https://172.31.25.137:9200)) with 727 reconnect attempt(s)","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log":{"source":"beat/metrics-monitoring"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":141,"file.name":"pipeline/client_worker.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

Fleet Server must not be returning an error to the connected secondary agent in this case because there are no Fleet related messages in its logs.

The agent status document would still have to be getting updated wouldn't it? With revokved credentials that doesn't seem like it should be possible either.

jlind23 commented 3 months ago

@amolnater-qasource is still something relevant or can I close it out?

amolnater-qasource commented 3 months ago

Hi @jlind23

We have revalidated this issue on latest 8.14.0 BC6 kibana self-managed environment and found it still reproducible.

Observations:

Secondary Agent Logs: elastic-agent-diagnostics-2024-05-29T07-10-01Z-00.zip

Screenshot: image image

On running Agent uninstall command for the unenrolled fleet server: image

Build details: VERSION: 8.14.0 BC6 BUILD: 73966 COMMIT: ed7758ff72688babbffbc95a3f047354dedb7add

Please let us know if anything else is required from our end. Thanks!