elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.73k stars 8.14k forks source link

[Fleet]: No appropriate agent upgrade failed message is available if agents fails to upgrade to the latest version. #140936

Open ghost opened 2 years ago

ghost commented 2 years ago

Kibana version: 8.5 Kibana Staging environment

Host OS and Browser version: All, All

Build Details:

Version: 8.5.0 SNAPSHOT
Build: 56399
Commit: 943675d4fc9807b4589266fcfed36016eea4317c

Preconditions:

Steps to reproduce:

  1. Navigate to Fleet > Agents tab
  2. Select few agents, say 3 agents.
  3. Click on 'Actions' dropdown.
  4. Select 'Upgrade 3 agents'.
  5. Upgrade 3 agents pop-up is shown.
  6. Click on Agent activity link.
  7. Agent activity flyout gets opened.
  8. Observe that 3 agents upgraded is shown on the flyout.

Actual Result:

image

Expected Result:

Mock UI from Figma:

image

Screen Recording:

https://user-images.githubusercontent.com/97870262/191010887-2e11ce7f-ac08-473a-ac5b-ba35af01b766.mp4

elasticmachine commented 2 years ago

Pinging @elastic/fleet (Team:Fleet)

dikshachauhan-qasource commented 2 years ago

Secondary Review is done.

juliaElastic commented 2 years ago

Can you check kibana logs to see if the error reason was that the agent is not upgradeable? I added a fix for that use case today, but if the error reason comes from the backend (agent or fleet server), then the errors should be already reported correctly in the activity.

ghost commented 2 years ago

Hi @juliaElastic,

Thank you for looking into this.

However, this agent upgrade issue is occurring due to the issue #139174

Further, please find the Kibana logs for the above issue: Kibana_logs.txt

Please let us know if we are missing anything.

Thanks!

juliaElastic commented 2 years ago

I see, this should be fixed in the latest kibana version, previously the error action results were not reported correctly. This is how it looks now with the latest changes:

image
ghost commented 2 years ago

Hi @juliaElastic,

Thank you for looking into this.

We will be re-validating this issue on latest Kibana version.

Thanks!

ghost commented 1 year ago

Hi @juliaElastic,

We have re-validated this issue on the latest 8.5.0 BC2 Kibana Staging environment and found that the issue is still reproducible.

Build details:

Version: 8.5.0 BC2
Build: 56806
Commit: dc769f45a5a6dafb0a8c8f0c0cabcced4df45e11

Below are the observations:

Screen Recording and Screenshot:

https://user-images.githubusercontent.com/97870262/193230722-4c772eea-4c62-43bc-a371-04b5ee2283f1.mp4

image

Screenshot:

image

Hence, we are re-opening this issue.

Please let us know if we are missing anything.

Thanks!

juliaElastic commented 1 year ago

@prachigupta-qasource I can't reproduce this locally, could you share the logs from agent, fleet server and kibana?

ghost commented 1 year ago

Hi @juliaElastic,

Please find the steps to reproduce the above issue:

  1. Enroll lower version agents.
  2. Enter incorrect URL https://test.elastic.co/downloads/ in Agent Binary under Fleet > Settings.
  3. Upgrade one OR more than one agents.
  4. Click on Agent activity link.
  5. Observe X agent/agents upgraded text on Agent activity flyout.

Agent Logs:

elastic-agent-diagnostics-2022-10-04T09-54-34Z-00.zip

Feet server Logs:

We are unable to fetch Feet server Logs due to the Hosted cloud environment.

Kibana Logs:

Kibana Logs.txt

Please let us know if we are missing anything.

Thanks!

juliaElastic commented 1 year ago

@prachigupta-qasource Please share the cloud link, so I can look at the instance in cloud admin to check the logs.

At step 2, did you update the Elastic Artifacts Host or did you add a new entry? If a new one, did you set it to default? I am asking because I don't see any matches on https://test.elastic.co/downloads/ in elastic agent logs.

image

I still can't reproduce, if I try the steps, I see an error result.

image

I saw this error in the logs that you shared:

[elastic_agent][error] 2022-10-04T05:23:28-04:00 - message: Application: [16a94f9c-4165-477c-a210-64b8da0174a4]: State changed to FAILED: failed upgrade of agent binary: 2 errors occurred:
    * package '/opt/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/elastic-agent-8.5.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/elastic-agent-8.5.0-linux-x86_64.tar.gz: no such file or directory
    * call to 'https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.5.0-linux-x86_64.tar.gz' returned unsuccessful status code: 404

 - type: 'ERROR' - sub_type: 'FAILED'
juliaElastic commented 1 year ago

@michel-laterman Could you have a look at this issue? There seems to be an error happening on elastic agent side on upgrade, which looks like not reported correctly to agent action results.

I found another issue that has similar logging errors, can we verify on BC3 if the issue is still reproducible?

michel-laterman commented 1 year ago

@juliaElastic, just so I understand; the error message appears in the logs and is expected to appear in the UI, correct? IIRC at the moment the elastic-agent sends a generic ack for most actions it receives that does not indicate a result (the application action that osquery uses is an exception to this).

juliaElastic commented 1 year ago

@michel-laterman there is an Error field in ActionResult that indicates if something went wrong in the action, we use that field on the UI to indicate whether the action failed or not. I have a suspicion that the error field is not set, that is why the action looks successful on the UI. However I can't reproduce so I can't verify this theory.