Open cllasyx opened 3 months ago
12:29:52.114 elastic_agent [elastic_agent][error] upgrade to version 8.14.3 failed: failed verification of agent binary: 2 errors occurred:
* could not get .asc file: fetching asc file from '/opt/Elastic/Agent/data/elastic-agent-8.14.2-173817/downloads/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc': open /opt/Elastic/Agent/data/elastic-agent-8.14.2-173817/downloads/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc: no such file or directory
* fetching asc file from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc: failed loading public key: Get "https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc": context deadline exceeded
This context deadline exceeded
for https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc is the source of the failure. It appears to be a timeout downloading the .asc file.
Since you could download it manually later, my first thought is this was a transient network error or problem with our artifacts CDN.
Is this still happening to your agents? Were you able to download the file while the agent was failing? This may indicate the problem is actually that our download timeout for this file needs to be longer.
I'm getting the same error as well upgrading from 8.14.1 to 8.14.3. All I did was applied the upgrade again through the Fleet UI.
upgrade to version 8.14.3 failed: failed verification of agent binary: 2 errors occurred:
* could not get .asc file: fetching asc file from '/opt/Elastic/Agent/data/elastic-agent-8.14.1-1348b9/downloads/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc': open /opt/Elastic/Agent/data/elastic-agent-8.14.1-1348b9/downloads/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc: no such file or directory
* fetching asc file from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc: failed loading public key: Get "https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc": context deadline exceeded
12:29:52.114 elastic_agent [elastic_agent][error] upgrade to version 8.14.3 failed: failed verification of agent binary: 2 errors occurred: * could not get .asc file: fetching asc file from '/opt/Elastic/Agent/data/elastic-agent-8.14.2-173817/downloads/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc': open /opt/Elastic/Agent/data/elastic-agent-8.14.2-173817/downloads/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc: no such file or directory * fetching asc file from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc: failed loading public key: Get "https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc": context deadline exceeded
This
context deadline exceeded
for https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc is the source of the failure. It appears to be a timeout downloading the .asc file.Since you could download it manually later, my first thought is this was a transient network error or problem with our artifacts CDN.
Is this still happening to your agents? Were you able to download the file while the agent was failing? This may indicate the problem is actually that our download timeout for this file needs to be longer.
I could indeed download the .asc file manually later while in the upgrade process. I have tried upgrading twice in a row, right after the first failure. The result was the same so my only option was to download it manually while the upgrade process was started to supply for the timeout.
You're most likely right and timeout period is too low.
For the agent part - I don't have any outdated agent right now I could test this all over again on.
@cllasyx Would you mind timing your curl command from the same host as before, so we can get a sense of how long it's taking?
time curl -L -O https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc
Thanks.
I don't think it matters what the time on this system is, I can see in our code that the .asc download does not share a context timeout with the agent package download and does not have retries. https://github.com/elastic/elastic-agent/blob/ca726a219e7289ca1278653003c8dc299d302093/internal/pkg/agent/application/upgrade/step_download.go#L103-L121
In the case of the HTTP verifier, we make one attempt to get it with a 30s timeout with no retries which is definitely wrong. 30s is fine for the timeout of an individual request, but we should retry as long as the overall upgrade download timeout is not expired.
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
Do we agree the problem here was the download of the asc
file?
The PGP key download is not mandatory atm - as it will try anyway to use the one embedded in the binary itself?
Do we agree the problem here was the download of the
asc
file?The PGP key download is not mandatory atm - as it will try anyway to use the one embedded in the binary itself?
Yes, the problem was definitely the download of the .asc file used for PGP verification.
The asc
is not the PGP key.
What I meant by the question is: the PGP warning is a red herring. Downloading the asc
was the problem.
I didn't say the asc
file is PGP key, I said it's used for verification which is true.
And in my response is stated that "the problem was definitely the download of the .asc file" which is the answer to your question.
Hello, I have deployed Elastic Agent with Fleet Server in version 8.14.2 and tried to upgrade few days later to 8.14.3.
When watching the logs through Observability -> Logs -> Stream I have noticed some error messages from elastic_agent dataset. The logs are provided below as well as temporary fix.
Steps to reproduce:
Log output:
Bug fix (manual):
While in the upgrade process, go into folder /opt/Elastic/Agent/data/elastic-agent-8.14.2-173817/downloads and issue the command below.
On the Fleet Server issue the command to manually download the .asc file:
Start the upgrade process again if it fails the first time from Kibana UI.
Wait until the upgrade is successfully done.
Notes
My Fleet Server host is listening on socket
*:8220
on a domain name https://myfleet.example.com:8220. The host has another socket open127.0.0.1:8221
which is used for internal API operations. My firewall has OUTPUT chain to accept all and INPUT chain has the rule to accept all connections made to loopback adapter as specified in a ruleiptables -A INPUT -i lo -j ACCEPT
.