elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
123 stars 132 forks source link

Elastic agent version 8.11.2 upgrade with agent binary download through proxy failed #4151

Closed DysonHo closed 7 months ago

DysonHo commented 7 months ago

Kibana Build details:

VERSION: 8.11.3

note:

Describe the bug: following the official doc:https://www.elastic.co/guide/en/fleet/current/fleet-agent-proxy-managed.html I simulate a environment that the Elastic agent can only communicate to fleet、elasticsearch and upgrade through proxy, control and data traffic is fine, I can manage elastic agent with fleet, and receive data, also. but when I try to upgrade elastic agent(v8.11.2), it failed.

I configure the elastic agent log level to "debug" and re-upgrade to find out what's going on:


I understand this functionality is in beta, but still try to figure out is this my problem or a bug, if anyone want more detail, please let me know, sincerely,

cmacknz commented 7 months ago
12:10:50.431 elastic_agent [elastic_agent][error] upgrade to version 8.11.3 failed: failed verification of agent binary: 2 errors occurred:
  * fetching asc file from 'C:\Program Files\Elastic\Agent\data\elastic-agent-1c21b0\downloads\elastic-agent-8.11.3-windows-x86_64.zip.asc': open C:\Program Files\Elastic\Agent\data\elastic-agent-1c21b0\downloads\elastic-agent-8.11.3-windows-x86_64.zip.asc: The system cannot find the file specified.
  * fetching asc file from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.11.3-windows-x86_64.zip.asc: failed loading public key: Get "https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.11.3-windows-x86_64.zip.asc": dial tcp: lookup artifacts.elastic.co: getaddrinfow: The requested name is valid, but no data of the requested type was found.

getaddrinfow: The requested name is valid, but no data of the requested type was found is a DNS resolution error. Since https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.11.3-windows-x86_64.zip.asc exists and can be downloaded this is likely be introduced by your proxy. I can't tell why though.

This doesn't immediately look like a bug in the agent itself. I'd suggest you continue troubleshooting this in https://discuss.elastic.co/c/elastic-stack/elastic-agent/91?page=1 which has a much wider audience.

For now I'll close this, please re-open if you confirm this isn't working properly.

FlorianHeigl commented 2 months ago

for the record, i had the opportunity to run into what appears the same issue.

{"log.level":"warn","@timestamp":"2024-06-30T18:32:20.635+0200","log.origin":
{"file.name":"composed/verifier.go","file.line":53},"message":"Verifier failed!","log":{"source":"elastic-agent"},"verifier":"http.verifier","error":{"message":"fetching asc file from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.12.2-linux-x86_64.tar.gz.asc: failed loading public key: Get \"https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.12.2-linux-

x86_64.tar.gz.asc\": context deadline exceeded"},"ecs.version":"1.6.0"}

The commonality seems the 8.11.2. version. I'm sure i had previously upgraded those VM's agents from

a lower version, but then they just didn't want to make the jump anymore. I wonder if there's a misleading element in this, the downloads work just the gpg involved stuff seems to not seeing the internetz... I dimly remember that GPG needs to be properly told how to get to the a keyserver through a proxy.

I don't know if it needed to access a GPG keyserver or import a new key. It says something like using default key. but it seems very likely.

The error handling must fall through to the GPG failure in too many cases (download failure of the actual agent as well as the GPG signature check. I would also argue that it is overkill to wipe the tar.gz if you can't validate due to a validation process failure vs. a FAILED validation. It's 500 meg after all. could burn through a network on a large scale.

FTR: The same system can download via proxy using wget and the file + hash are fetched without problem, which matches with the messages in the logs. HTH

Here I ran

wget https://artifacts.elastic.co/GPG-KEY-elasticsearch
apt-key add ./GPG-KEY-elasticsearch

Getting me two installed keys and a further failed verification

t":{"cgroup":{"memory":{"mem":{"usage":{"bytes":1782841344}}}},"cpu":{"system":{"ticks":741410},"total":{"ticks":2061060,"time":{"ms":40},"value":2061060},"user":{"ticks":1319650,"time":{"ms":40}}},"handles":{"limit":{"hard":524288,"soft":524288},"open":14},"info":{"ephemeral_id":"c0548b24-1e6d-45a4-a95e-0fa98978f4c0","uptime":{"ms":1748552215},"version":"8.11.2"},"memstats":{"gc_next":56998920,"memory_alloc":28386896,"memory_total":103513526776,"rss":124989440},"runtime":{"goroutines":67}},"filebeat":{"events":{"active":0,"added":24,"done":56},"harvester":{"open_files":2,"running":2}},"libbeat":{"config":{"module":{"running":3}},"output":{"events":{"acked":24,"active":197,"batches":3,"total":24},"read":{"bytes":1018},"write":{"bytes":5338}},"pipeline":{"clients":3,"events":{"active":0,"published":24,"total":24},"queue":{"acked":24}}},"processor":{"syslog":{"1":{"success":1}}},"registrar":{"states":{"current":4,"update":24},"writes":{"success":4,"total":4}},"system":{"load":{"1":0.15,"15":0.1,"5":0.13,"norm":{"1":0.0375,"15":0.025,"5":0.0325}}}}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-06-30T18:55:00.520+0200","log.origin":{"file.name":"http/downloader.go","file.line":319},"message":"download from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.12.2-linux-x86_64.tar.gz completed in 2 minutes @ 4.228MBps","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-06-30T18:55:00.538+0200","log.origin":{"file.name":"http/downloader.go","file.line":319},"message":"download from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.12.2-linux-x86_64.tar.gz.sha512 completed in Less than a second @ +InfYBps","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-06-30T18:55:03.024+0200","log.origin":{"file.name":"fs/verifier.go","file.line":119},"message":"Default PGP being appended","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-06-30T18:55:03.141+0200","log.origin":{"file.name":"fs/verifier.go","file.line":144},"message":"Using 2 PGP keys","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-06-30T18:55:03.141+0200","log.origin":{"file.name":"composed/verifier.go","file.line":53},"message":"Verifier failed!","log":{"source":"elastic-agent"},"verifier":"fs.verifier","error":{"message":"fetching asc file from '/opt/Elastic/Agent/data/elastic-agent-1c21b0/downloads/elastic-agent-8.12.2-linux-x86_64.tar.gz.asc': open /opt/Elastic/Agent/data/elastic-agent-1c21b0/downloads/elastic-agent-8.12.2-linux-x86_64.tar.gz.asc: no such file or directory"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-06-30T18:55:04.669+0200","message":"Non-zero metrics in the last 30s","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"monitoring","log.origin":{"file.line":187,"file.name":"log/log.go"},"service.name":"filebeat","monitoring":{"ecs.version":"1.6.0","metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":1877934080}}}},"cpu":{"system":{"ticks":3220,"time":{"ms":10}},"total":{"ticks":22570,"time":{"ms":20},"value":22570},"user":{"ticks":19350,"time":{"ms":10}}},"handles":{"limit":{"hard":524288,"soft":524288},"open":14},"info":{"ephemeral_id":"32050ee9-6f11-4fc7-af95-339b0eb1a054","uptime":{"ms":5010111},"version":"8.11.2"},"memstats":{"gc_next":89443376,"memory_alloc":47502696,"memory_total":1883514984,"rss":182353920},"runtime":{"goroutines":53}},"filebeat":{"events":{"active":0,"added":6,"done":6},"harvester":{"open_files":2,"running":2}},"libbeat":{"config":{"module":{"running":2}},"output":{"events":{"acked":5,"active":0,"batches":2,"total":5},"read":{"bytes":393},"write":{"bytes":4508}},"pipeline":{"clients":2,"events":{"active":0,"filtered":1,"published":5,"total":6},"queue":{"acked":5}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":0.11,"15":0.09,"5":0.12,"norm":{"1":0.0275,"15":0.0225,"5":0.03}}}}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-06-30T18:55:05.633+0200","log.origin":{"file.name":"http/verifier.go","file.line":120},"message":"Default PGP being appended","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-06-30T18:55:05.684+0200","log.origin":{"file.name":"http/verifier.go","file.line":145},"message":"Using 2 PGP keys","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}

it seems quite illogical.

edit: found https://www.elastic.co/guide/en/fleet/current/fleet-troubleshooting.html#pgp-key-download-fail

the workaround seems horrendous though. plural, actually. my brain is still hurting even after i decided to just uninstall and reinstall and hope for the best.

In case someone from Elastic sees this: If you offer an option to set a proxy for downloading the artifacts you need to make this work proper. Do not use proxy for calling to fleet. Do use proxy to get GPG key. Do not get GPG key if you have it. Unless there's a reason. Then get it. Do handle the individual issues and fail accordingly. You can't go delete your signature file that you just downloaded, and then try verify it with a key you had but try to download using the wrong method.