elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
95 stars 4.92k forks source link

x-pack/filebeat: AWS test failure #40503

Open oakrizan opened 2 months ago

oakrizan commented 2 months ago

Flaky Test

On 7.17 branch AWS tests were enabled for specific changeset and were successful (eg. https://github.com/elastic/beats/pull/35885). After Beats migration from Jenkins to Buildkite, AWS tests were temporarily re-enabled for x-pack/filebeat on 8.*/main branches for validation purposes.

AWS test failure context: https://github.com/elastic/beats/issues/36425

There was a similar issue opened when running test on Windows: https://github.com/elastic/beats/issues/39657. It was fixed by increasing timeout from 5 to 10 (https://github.com/elastic/beats/pull/39713). Apparently it's sometimes not enough when running tests on AWS.

I have created a https://github.com/elastic/beats/pull/40162, where timeout is 20 and tests seems to be successful when executed on AWS: https://buildkite.com/elastic/beats-xpack-filebeat/builds/4118. Basically this problem can be bypassed either by enabling retry for AWS step, either by increasing timeout again.

beats-xpack-filebeat_build_3351_ubuntu-x-pack-slash-filebeat-aws-tests.log beats-xpack-filebeat_build_3351_ubuntu-x-pack-slash-filebeat-aws-tests-retry.log

Stack Trace

=== Failed
=== FAIL: x-pack/filebeat/input/cel TestInput/retry_failure (0.01s)
    input_test.go:1602: unexpected result for event 0: got:- want:+
          mapstr.M{
        -   "error": map[string]any{
        -       "message": string("failed eval: ERROR: <input>:2:19: failed to unmarshal JSON message: unexpected end of JSON input\n |  get(state.url).as(resp, {\n "...),
        -   },
        +   "hello": string("world"),
          }
=== FAIL: x-pack/filebeat/input/cel TestInput (38.59s)
elasticmachine commented 2 months ago

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

rowlandgeoff commented 2 months ago

During the migration of beats-ci from Jenkins to Buildkite, a number of tests were failing consistently due to issues unrelated to the migration. Those tests were disabled to stabilize the CI, with the intent to revisit them post-migration. @oakrizan has reviewed them all in her draft PRs linked above in the description, and has opened tickets such as this one to highlight to the product teams the tests that are currently still disabled and could use some attention.

oakrizan commented 2 months ago

I have potentially fixed problem with CURL by updating the version of observability/stream from v0.6.1 to v0.7.0 in testing/environments/docker/cometd/Dockerfile. But now AWS test step fails with AuthorizationError: https://buildkite.com/elastic/beats-xpack-filebeat/builds/5127#01915b0f-c5ba-41ba-9c70-453222f11cf3. Will look for the solution for that.

In addition removed retry in initCloudEnv.sh since there is no log in case of failure of terraformApply. Related PR: https://github.com/elastic/beats/pull/40549