elastic / e2e-testing

Formal verification of Elastic-Agent and more using BDD
Other
24 stars 42 forks source link

No datastreams are listed on Fleet #1274

Closed mdelapenya closed 3 years ago

mdelapenya commented 3 years ago

The step system package dashboards are listed in Fleet is failing because it does not faind any data stream in the max timeout (3min).

It fails on Centos and Debian, in both AMD and ARM.

Steps to reproduce

1 Run:

$ TAGS="fleet_mode_agent && install && centos" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true ELASTIC_APM_ACTIVE=false make -C e2e/_suites/fleet functional-test

Expected behaviour: the scenario passes Current behaviour: --- Failed steps:

  Scenario Outline: Deploying the centos agent # features/fleet_mode_agent.feature:6
    And system package dashboards are listed in Fleet # features/fleet_mode_agent.feature:10
      Error: There are no datastreams yet

1 scenarios (1 failed)
4 steps (3 passed, 1 failed)
5m22.255990276s
make: *** [functional-test] Error 1
mdelapenya commented 3 years ago

I'm currently bisecting the test execution between four kibana commits.

Screenshot 2021-06-21 at 17 45 17 Screenshot 2021-06-21 at 18 21 30

I've built and pushed Kibana images for those commits, only for AMD, as the APM-CI job is only building the AMD image. For that reason, CI builds below will contain errors for all ARM stages. But we do not care, as we want to verify if the image breaks the tests, never mind whether it is AMD or ARM.

❌ 1st commit: 4a941565502547f96bab72786e1ac11f61f19558 (elastic/kibana#101828)

UPDATE: this image contains the failed Revoke token scenario, verified locally with:

TAGS="fleet_mode_agent && revoke-token && centos" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true KIBANA_VERSION=pr101828 ELASTIC_APM_ACTIVE=false make -C e2e/_suites/fleet functional-test

❌ 2nd commit: cd5cd65fb2ec04ed63fcbc6b87f1fdb7333bee72 (elastic/kibana#102219)

UPDATE: this image contains the failed Revoke token scenario, verified locally with:

TAGS="fleet_mode_agent && revoke-token && centos" TIMEOUT_FACTOR=3 LOG_LEVEL=TRACE DEVELOPER_MODE=true KIBANA_VERSION=pr102219 ELASTIC_APM_ACTIVE=false make -C e2e/_suites/fleet functional-test

3rd commit: 35cc59b571d19fe52eff17777a4613fd867ff928 (elastic/kibana#101835)

4th commit: 6df58dd7ca53b43c2f143823ebbe51083618032b (elastic/kibana#101752)

Will post results here.

mdelapenya commented 3 years ago

It's weird: I've tested with an old Kibana image (pr101655, from June 8th 2021) and the test fails. I think the error is not on Kibana but in the other pieces: fleet-server or the agent. I cannot run the tests for incremental commits of the agent because the artifacts are not generated in the GCP bucket.

I'm going to locally bisect the elastic-agent image, updating the fleet-server agent and see if the problem comes from there:

docker pull docker.elastic.co/observability-ci/elastic-agent:pr-26260-amd64
docker tag docker.elastic.co/observability-ci/elastic-agent:pr-26260-amd64  docker.elastic.co/observability-ci/elastic-agent:8.0.0-SNAPSHOT

UPDATE: It is difficult to bisect, because fleet server is trying to validate a binary that is trying to install locally. This is the log output of the fleet-server:

Performing setup of Fleet in Kibana

Policy selected for enrollment:  
The Elastic Agent is currently in BETA and should not be used in production
2021-06-21T16:50:31.427Z        INFO    cmd/enroll_cmd.go:469   Spawning Elastic Agent daemon as a subprocess to complete bootstrap process.
2021-06-21T16:50:31.570Z        INFO    warn/warn.go:18 The Elastic Agent is currently in BETA and should not be used in production
2021-06-21T16:50:31.570Z        INFO    application/application.go:68   Detecting execution mode
2021-06-21T16:50:31.571Z        INFO    application/application.go:89   Agent is in Fleet Server bootstrap mode
2021-06-21T16:50:31.914Z        INFO    [api]   api/server.go:62        Starting stats endpoint
2021-06-21T16:50:31.914Z        INFO    application/fleet_server_bootstrap.go:124       Agent is starting
2021-06-21T16:50:31.914Z        INFO    [api]   api/server.go:64        Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2021-06-21T16:50:31.916Z        INFO    application/fleet_server_bootstrap.go:134       Agent is stopped
2021-06-21T16:50:32.431Z        INFO    cmd/enroll_cmd.go:611   Waiting for Elastic Agent to start Fleet Server
2021-06-21T16:50:32.633Z        INFO    stateresolver/stateresolver.go:48       New State ID is V87_qo-m
2021-06-21T16:50:32.633Z        INFO    stateresolver/stateresolver.go:49       Converging state requires execution of 1 step(s)
2021-06-21T16:50:35.243Z        INFO    operation/operation_fetch.go:75 downloaded binary 'fleet-server.8.0.0-SNAPSHOT' into '/usr/share/elastic-agent/state/data/downloads/fleet-server-8.0.0-SNAPSHOT-linux-x86_64.tar.gz' as part of operation 'operation-fetch'
2021-06-21T16:50:36.415Z        INFO    log/reporter.go:40      2021-06-21T16:50:36Z - message: Application: fleet-server--8.0.0-SNAPSHOT[]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2021-06-21T16:50:36.417Z        INFO    stateresolver/stateresolver.go:66       Updating internal state
2021-06-21T16:50:36.443Z        INFO    cmd/enroll_cmd.go:644   Fleet Server - Starting
2021-06-21T16:50:37.950Z        WARN    status/reporter.go:236  Elastic Agent status changed to: 'degraded'
2021-06-21T16:50:37.950Z        INFO    log/reporter.go:40      2021-06-21T16:50:37Z - message: Application: fleet-server--8.0.0-SNAPSHOT[]: State changed to DEGRADED: Running on default policy with Fleet Server integration; missing config fleet.agent.id (expected during bootstrap process) - type: 'STATE' - sub_type: 'RUNNING'
2021-06-21T16:50:38.448Z        INFO    cmd/enroll_cmd.go:625   Fleet Server - Running on default policy with Fleet Server integration; missing config fleet.agent.id (expected during bootstrap process)
2021-06-21T16:50:39.312Z        INFO    cmd/enroll_cmd.go:207   Elastic Agent has been enrolled; start Elastic Agent
2021-06-21T16:50:39.312Z        INFO    cmd/run.go:189  Shutting down Elastic Agent and sending last events...
2021-06-21T16:50:39.312Z        INFO    operation/operator.go:191       waiting for installer of pipeline 'default' to finish
2021-06-21T16:50:39.312Z        INFO    process/app.go:181      Signaling application to stop because of shutdown: fleet-server--8.0.0-SNAPSHOT
2021-06-21T16:50:39.813Z        INFO    status/reporter.go:236  Elastic Agent status changed to: 'online'
2021-06-21T16:50:39.814Z        INFO    cmd/run.go:197  Shutting down completed.
2021-06-21T16:50:39.814Z        INFO    log/reporter.go:40      2021-06-21T16:50:39Z - message: Application: fleet-server--8.0.0-SNAPSHOT[]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'
2021-06-21T16:50:39.814Z        INFO    [api]   api/server.go:66        Stats endpoint (/usr/share/elastic-agent/state/data/tmp/elastic-agent.sock) finished: accept unix /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock: use of closed network connection
Successfully enrolled the Elastic Agent.
2021-06-21T16:50:39.929Z        INFO    warn/warn.go:18 The Elastic Agent is currently in BETA and should not be used in production
2021-06-21T16:50:39.929Z        INFO    application/application.go:68   Detecting execution mode
2021-06-21T16:50:39.930Z        INFO    application/application.go:93   Agent is managed by Fleet
2021-06-21T16:50:39.930Z        INFO    capabilities/capabilities.go:59 capabilities file not found in /usr/share/elastic-agent/state/capabilities.yml
2021-06-21T16:50:40.006Z        INFO    [composable]    composable/controller.go:46     EXPERIMENTAL - Inputs with variables are currently experimental and should not be used in production
2021-06-21T16:50:40.110Z        INFO    [composable.providers.docker]   docker/docker.go:43     Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2021-06-21T16:50:40.111Z        INFO    [api]   api/server.go:62        Starting stats endpoint
2021-06-21T16:50:40.111Z        INFO    application/managed_mode.go:290 Agent is starting
2021-06-21T16:50:40.111Z        INFO    [api]   api/server.go:64        Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2021-06-21T16:50:40.214Z        WARN    application/managed_mode.go:303 failed to ack update open /usr/share/elastic-agent/state/data/.update-marker: no such file or directory
2021-06-21T16:50:40.843Z        INFO    stateresolver/stateresolver.go:48       New State ID is GZd1I8Eu
2021-06-21T16:50:40.843Z        INFO    stateresolver/stateresolver.go:49       Converging state requires execution of 2 step(s)
2021-06-21T16:50:41.426Z        INFO    operation/operator.go:259       operation 'operation-install' skipped for fleet-server.8.0.0-SNAPSHOT
2021-06-21T16:50:41.531Z        INFO    log/reporter.go:40      2021-06-21T16:50:41Z - message: Application: fleet-server--8.0.0-SNAPSHOT[69f22191-30a0-4f0a-b54e-eaab826d4a87]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2021-06-21T16:50:42.563Z        INFO    log/reporter.go:40      2021-06-21T16:50:42Z - message: Application: fleet-server--8.0.0-SNAPSHOT[69f22191-30a0-4f0a-b54e-eaab826d4a87]: State changed to RUNNING: Running on default policy with Fleet Server integration - type: 'STATE' - sub_type: 'RUNNING'
2021-06-21T16:50:42.598Z        ERROR   log/reporter.go:36      2021-06-21T16:50:42Z - message: Application: filebeat--8.0.0-SNAPSHOT--36643631373035623733363936343635[69f22191-30a0-4f0a-b54e-eaab826d4a87]: State changed to FAILED: operation 'operation-verify' failed to verify filebeat.8.0.0-SNAPSHOT: 3 errors occurred:
        * fetching asc file from '/usr/share/elastic-agent/state/data/downloads/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
        * check detached signature: openpgp: invalid signature: hash tag doesn't match
        * fetching asc file from https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404

 - type: 'ERROR' - sub_type: 'FAILED'
2021-06-21T16:50:42.598Z        ERROR   status/reporter.go:236  Elastic Agent status changed to: 'error'
2021-06-21T16:50:42.598Z        ERROR   operation/operation_retryable.go:85     operation operation-verify failed, err: operation 'operation-verify' failed to verify filebeat.8.0.0-SNAPSHOT: 3 errors occurred:
        * fetching asc file from '/usr/share/elastic-agent/state/data/downloads/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
        * check detached signature: openpgp: invalid signature: hash tag doesn't match
        * fetching asc file from https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404

2021-06-21T16:50:43.664Z        ERROR   operation/operation_retryable.go:85     operation operation-verify failed, err: operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
        * fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
        * check detached signature: openpgp: invalid signature: hash tag doesn't match
        * fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404

2021-06-21T16:50:43.664Z        INFO    [api]   api/server.go:66        Stats endpoint (/usr/share/elastic-agent/state/data/tmp/elastic-agent.sock) finished: accept unix /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock: use of closed network connection
Error: operator: failed to execute step sc-run, error: 2 errors occurred:
        * operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
        * fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
        * check detached signature: openpgp: invalid signature: hash tag doesn't match
        * fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404

        * operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
        * fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
        * check detached signature: openpgp: invalid signature: hash tag doesn't match
        * fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404

: 2 errors occurred:
        * operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
        * fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
        * check detached signature: openpgp: invalid signature: hash tag doesn't match
        * fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404

        * operation 'operation-verify' failed to verify metricbeat.8.0.0-SNAPSHOT: 3 errors occurred:
        * fetching asc file from '/usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc': open /usr/share/elastic-agent/state/data/downloads/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: no such file or directory
        * check detached signature: openpgp: invalid signature: hash tag doesn't match
        * fetching asc file from https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc: call to 'https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz.asc' returned unsuccessful status code: 404
mdelapenya commented 3 years ago

Mmm, trying previous step with a clean environment, removing all services in the compose file, and now fleet-server starts properly.

But unfortunately the revoke test still fails:

--- Failed steps:

  Scenario Outline: Revoking the enrollment token for the centos agent # features/fleet_mode_agent.feature:105
    Then an attempt to enroll a new agent fails # features/fleet_mode_agent.feature:108
      Error: The agent was enrolled although the token was previously revoked

So if the fleet-server is from 8 days ago, when the tests were supposed to be passing, and the tests actually fail, I'd say it's because of another piece of the stack: it seems it's not kibana, it seems it's not fleet-server. Let's check with the agent. I'm gonna bisect the agent, although I'm seeing problems with the packaging job not producing the commits artifacts for all commits cc/ @elastic/observablt-robots

adam-stokes commented 3 years ago

Could this be an elasticsearch change? Maybe the query has changed?

adam-stokes commented 3 years ago

This is due to the way we currently bring up Kibana, the environment variable XPACK_FLEET_AGENTS_FLEET_SERVER_HOSTS is not being honored properly it seems. Same reason why #1273 is failing as well. Will post a reference bug once available

cachedout commented 3 years ago

@adam-stokes and @mdelapenya did https://github.com/elastic/e2e-testing/pull/1281 fix this as well?

mdelapenya commented 3 years ago

@adam-stokes and @mdelapenya did #1281 fix this as well?

No, that was not a solution and this issue is still under investigation. We'll post here more work about it

EricDavisX commented 3 years ago

I logged a product team bug for this, thanks so much for noting it Manu in slack. def sounds like the same issue: https://github.com/elastic/beats/issues/26518

adam-stokes commented 3 years ago

This looks to have been fixed, will wait for @mdelapenya to verify but our tests are passing again

EricDavisX commented 3 years ago

the duplicate issue was re-tested and found fixed - let's not wait, i'm closing it out. there are other reasons for the tests to fail, if they still are, and we should do new tickets. : / thanks Adam.

mdelapenya commented 3 years ago

Fixed in https://github.com/elastic/kibana/pull/104415