Closed mdelapenya closed 3 years ago
I'm currently bisecting the test execution between four kibana commits.
I've built and pushed Kibana images for those commits, only for AMD, as the APM-CI job is only building the AMD image. For that reason, CI builds below will contain errors for all ARM stages. But we do not care, as we want to verify if the image breaks the tests, never mind whether it is AMD or ARM.
TAGS="fleet_mode_agent && install && centos" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE KIBANA_VERSION=pr101828 DEVELOPER_MODE=true make -C e2e/_suites/fleet functional-test
UPDATE: this image contains the failed Revoke token
scenario, verified locally with:
TAGS="fleet_mode_agent && revoke-token && centos" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true KIBANA_VERSION=pr101828 ELASTIC_APM_ACTIVE=false make -C e2e/_suites/fleet functional-test
TAGS="fleet_mode_agent && install && centos" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE KIBANA_VERSION=pr102219 DEVELOPER_MODE=true make -C e2e/_suites/fleet functional-test
UPDATE: this image contains the failed Revoke token
scenario, verified locally with:
TAGS="fleet_mode_agent && revoke-token && centos" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true KIBANA_VERSION=pr102219 ELASTIC_APM_ACTIVE=false make -C e2e/_suites/fleet functional-test
TAGS="fleet_mode_agent && install && centos" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE KIBANA_VERSION=pr101835 DEVELOPER_MODE=true make -C e2e/_suites/fleet functional-test
TAGS="fleet_mode_agent && install && centos" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE KIBANA_VERSION=pr101752 DEVELOPER_MODE=true make -C e2e/_suites/fleet functional-test
Will post results here.
It's weird: I've tested with an old Kibana image (pr101655
, from June 8th 2021) and the test fails. I think the error is not on Kibana but in the other pieces: fleet-server or the agent. I cannot run the tests for incremental commits of the agent because the artifacts are not generated in the GCP bucket.
I'm going to locally bisect the elastic-agent image, updating the fleet-server agent and see if the problem comes from there:
docker pull docker.elastic.co/observability-ci/elastic-agent:pr-26260-amd64
docker tag docker.elastic.co/observability-ci/elastic-agent:pr-26260-amd64 docker.elastic.co/observability-ci/elastic-agent:8.0.0-SNAPSHOT
UPDATE: It is difficult to bisect, because fleet server is trying to validate a binary that is trying to install locally. This is the log output of the fleet-server:
Performing setup of Fleet in Kibana
Policy selected for enrollment:
The Elastic Agent is currently in BETA and should not be used in production
2021-06-21T16:50:31.427Z INFO cmd/enroll_cmd.go:469 Spawning Elastic Agent daemon as a subprocess to complete bootstrap process.
2021-06-21T16:50:31.570Z INFO warn/warn.go:18 The Elastic Agent is currently in BETA and should not be used in production
2021-06-21T16:50:31.570Z INFO application/application.go:68 Detecting execution mode
2021-06-21T16:50:31.571Z INFO application/application.go:89 Agent is in Fleet Server bootstrap mode
2021-06-21T16:50:31.914Z INFO [api] api/server.go:62 Starting stats endpoint
2021-06-21T16:50:31.914Z INFO application/fleet_server_bootstrap.go:124 Agent is starting
2021-06-21T16:50:31.914Z INFO [api] api/server.go:64 Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2021-06-21T16:50:31.916Z INFO application/fleet_server_bootstrap.go:134 Agent is stopped
2021-06-21T16:50:32.431Z INFO cmd/enroll_cmd.go:611 Waiting for Elastic Agent to start Fleet Server
2021-06-21T16:50:32.633Z INFO stateresolver/stateresolver.go:48 New State ID is V87_qo-m
2021-06-21T16:50:32.633Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 1 step(s)
2021-06-21T16:50:35.243Z INFO operation/operation_fetch.go:75 downloaded binary 'fleet-server.8.0.0-SNAPSHOT' into '/usr/share/elastic-agent/state/data/downloads/fleet-server-8.0.0-SNAPSHOT-linux-x86_64.tar.gz' as part of operation 'operation-fetch'
2021-06-21T16:50:36.415Z INFO log/reporter.go:40 2021-06-21T16:50:36Z - message: Application: fleet-server--8.0.0-SNAPSHOT[]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2021-06-21T16:50:36.417Z INFO stateresolver/stateresolver.go:66 Updating internal state
2021-06-21T16:50:36.443Z INFO cmd/enroll_cmd.go:644 Fleet Server - Starting
2021-06-21T16:50:37.950Z WARN status/reporter.go:236 Elastic Agent status changed to: 'degraded'
2021-06-21T16:50:37.950Z INFO log/reporter.go:40 2021-06-21T16:50:37Z - message: Application: fleet-server--8.0.0-SNAPSHOT[]: State changed to DEGRADED: Running on default policy with Fleet Server integration; missing config fleet.agent.id (expected during bootstrap process) - type: 'STATE' - sub_type: 'RUNNING'
2021-06-21T16:50:38.448Z INFO cmd/enroll_cmd.go:625 Fleet Server - Running on default policy with Fleet Server integration; missing config fleet.agent.id (expected during bootstrap process)
2021-06-21T16:50:39.312Z INFO cmd/enroll_cmd.go:207 Elastic Agent has been enrolled; start Elastic Agent
2021-06-21T16:50:39.312Z INFO cmd/run.go:189 Shutting down Elastic Agent and sending last events...
2021-06-21T16:50:39.312Z INFO operation/operator.go:191 waiting for installer of pipeline 'default' to finish
2021-06-21T16:50:39.312Z INFO process/app.go:181 Signaling application to stop because of shutdown: fleet-server--8.0.0-SNAPSHOT
2021-06-21T16:50:39.813Z INFO status/reporter.go:236 Elastic Agent status changed to: 'online'
2021-06-21T16:50:39.814Z INFO cmd/run.go:197 Shutting down completed.
2021-06-21T16:50:39.814Z INFO log/reporter.go:40 2021-06-21T16:50:39Z - message: Application: fleet-server--8.0.0-SNAPSHOT[]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'
2021-06-21T16:50:39.814Z INFO [api] api/server.go:66 Stats endpoint (/usr/share/elastic-agent/state/data/tmp/elastic-agent.sock) finished: accept unix /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock: use of closed network connection
Successfully enrolled the Elastic Agent.
2021-06-21T16:50:39.929Z INFO warn/warn.go:18 The Elastic Agent is currently in BETA and should not be used in production
2021-06-21T16:50:39.929Z INFO application/application.go:68 Detecting execution mode
2021-06-21T16:50:39.930Z INFO application/application.go:93 Agent is managed by Fleet
2021-06-21T16:50:39.930Z INFO capabilities/capabilities.go:59 capabilities file not found in /usr/share/elastic-agent/state/capabilities.yml
2021-06-21T16:50:40.006Z INFO [composable] composable/controller.go:46 EXPERIMENTAL - Inputs with variables are currently experimental and should not be used in production
2021-06-21T16:50:40.110Z INFO [composable.providers.docker] docker/docker.go:43 Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2021-06-21T16:50:40.111Z INFO [api] api/server.go:62 Starting stats endpoint
2021-06-21T16:50:40.111Z INFO application/managed_mode.go:290 Agent is starting
2021-06-21T16:50:40.111Z INFO [api] api/server.go:64 Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2021-06-21T16:50:40.214Z WARN application/managed_mode.go:303 failed to ack update open /usr/share/elastic-agent/state/data/.update-marker: no such file or directory
2021-06-21T16:50:40.843Z INFO stateresolver/stateresolver.go:48 New State ID is GZd1I8Eu
2021-06-21T16:50:40.843Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 2 step(s)
2021-06-21T16:50:41.426Z INFO operation/operator.go:259 operation 'operation-install' skipped for fleet-server.8.0.0-SNAPSHOT
2021-06-21T16:50:41.531Z INFO log/reporter.go:40 2021-06-21T16:50:41Z - message: Application: fleet-server--8.0.0-SNAPSHOT[69f22191-30a0-4f0a-b54e-eaab826d4a87]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2021-06-21T16:50:42.563Z INFO log/reporter.go:40 2021-06-21T16:50:42Z - message: Application: fleet-server--8.0.0-SNAPSHOT[69f22191-30a0-4f0a-b54e-eaab826d4a87]: State changed to RUNNING: Running on default policy with Fleet Server integration - type: 'STATE' - sub_type: 'RUNNING'
2021-06-21T16:50:42.598Z ERROR log/reporter.go:36 2021-06-21T16:50:42Z - message: Application: filebeat--8.0.0-SNAPSHOT--36643631373035623733363936343635[69f22191-30a0-4f0a-b54e-eaab826d4a87]: State changed to FAILED: operation 'operation-verify' failed to verify filebeat.8.0.0-SNAPSHOT: 3 errors occurred:
* fetching asc file from '/usr/share/elastic-agent/state/data/downloads/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
* check detached signature: openpgp: invalid signature: hash tag doesn't match
* fetching asc file from https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404
- type: 'ERROR' - sub_type: 'FAILED'
2021-06-21T16:50:42.598Z ERROR status/reporter.go:236 Elastic Agent status changed to: 'error'
2021-06-21T16:50:42.598Z ERROR operation/operation_retryable.go:85 operation operation-verify failed, err: operation 'operation-verify' failed to verify filebeat.8.0.0-SNAPSHOT: 3 errors occurred:
* fetching asc file from '/usr/share/elastic-agent/state/data/downloads/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
* check detached signature: openpgp: invalid signature: hash tag doesn't match
* fetching asc file from https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404
2021-06-21T16:50:43.664Z ERROR operation/operation_retryable.go:85 operation operation-verify failed, err: operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
* fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
* check detached signature: openpgp: invalid signature: hash tag doesn't match
* fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404
2021-06-21T16:50:43.664Z INFO [api] api/server.go:66 Stats endpoint (/usr/share/elastic-agent/state/data/tmp/elastic-agent.sock) finished: accept unix /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock: use of closed network connection
Error: operator: failed to execute step sc-run, error: 2 errors occurred:
* operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
* fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
* check detached signature: openpgp: invalid signature: hash tag doesn't match
* fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404
* operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
* fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
* check detached signature: openpgp: invalid signature: hash tag doesn't match
* fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404
: 2 errors occurred:
* operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
* fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
* check detached signature: openpgp: invalid signature: hash tag doesn't match
* fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404
* operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
* fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
* check detached signature: openpgp: invalid signature: hash tag doesn't match
* fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404
Mmm, trying previous step with a clean environment, removing all services in the compose file, and now fleet-server starts properly.
But unfortunately the revoke test still fails:
--- Failed steps:
Scenario Outline: Revoking the enrollment token for the centos agent # features/fleet_mode_agent.feature:105
Then an attempt to enroll a new agent fails # features/fleet_mode_agent.feature:108
Error: The agent was enrolled although the token was previously revoked
So if the fleet-server is from 8 days ago, when the tests were supposed to be passing, and the tests actually fail, I'd say it's because of another piece of the stack: it seems it's not kibana, it seems it's not fleet-server. Let's check with the agent. I'm gonna bisect the agent, although I'm seeing problems with the packaging job not producing the commits
artifacts for all commits cc/ @elastic/observablt-robots
Could this be an elasticsearch change? Maybe the query has changed?
This is due to the way we currently bring up Kibana, the environment variable XPACK_FLEET_AGENTS_FLEET_SERVER_HOSTS
is not being honored properly it seems. Same reason why #1273 is failing as well. Will post a reference bug once available
@adam-stokes and @mdelapenya did https://github.com/elastic/e2e-testing/pull/1281 fix this as well?
@adam-stokes and @mdelapenya did #1281 fix this as well?
No, that was not a solution and this issue is still under investigation. We'll post here more work about it
I logged a product team bug for this, thanks so much for noting it Manu in slack. def sounds like the same issue: https://github.com/elastic/beats/issues/26518
This looks to have been fixed, will wait for @mdelapenya to verify but our tests are passing again
the duplicate issue was re-tested and found fixed - let's not wait, i'm closing it out. there are other reasons for the tests to fail, if they still are, and we should do new tickets. : / thanks Adam.
The step
system package dashboards are listed in Fleet
is failing because it does not faind any data stream in the max timeout (3min).It fails on Centos and Debian, in both AMD and ARM.
Steps to reproduce
1 Run:
Expected behaviour: the scenario passes Current behaviour: --- Failed steps: