elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
12 stars 142 forks source link

[Flaky Test]: TestEventLogOutputConfiguredViaFleet – could not find event log file #5159

Closed rdner closed 1 month ago

rdner commented 3 months ago

Failing test case

TestEventLogOutputConfiguredViaFleet

Error message

could not find event log file

Build

OS

Linux

Stacktrace and notes

=== RUN   TestEventLogOutputConfiguredViaFleet
    container_cmd_test.go:75: Creating enrollment API key...
    fetcher.go:95: Using existing artifact elastic-agent-8.15.0-SNAPSHOT-linux-x86_64.tar.gz
    fixture.go:282: Extracting artifact elastic-agent-8.15.0-SNAPSHOT-linux-x86_64.tar.gz to /tmp/TestEventLogOutputConfiguredViaFleet188574346/001
    fixture.go:300: Completed extraction of artifact elastic-agent-8.15.0-SNAPSHOT-linux-x86_64.tar.gz to /tmp/TestEventLogOutputConfiguredViaFleet188574346/001
    fixture.go:906: Components were not modified from the fetched artifact
    fixture.go:657: >> running binary with: [/tmp/TestEventLogOutputConfiguredViaFleet188574346/001/elastic-agent-8.15.0-SNAPSHOT-linux-x86_64/elastic-agent status --output json]
    fixture.go:657: >> running binary with: [/tmp/TestEventLogOutputConfiguredViaFleet188574346/001/elastic-agent-8.15.0-SNAPSHOT-linux-x86_64/elastic-agent status --output json]
    fixture.go:657: >> running binary with: [/tmp/TestEventLogOutputConfiguredViaFleet188574346/001/elastic-agent-8.15.0-SNAPSHOT-linux-x86_64/elastic-agent status --output json]
    fixture.go:657: >> running binary with: [/tmp/TestEventLogOutputConfiguredViaFleet188574346/001/elastic-agent-8.15.0-SNAPSHOT-linux-x86_64/elastic-agent status --output json]
    fixture.go:657: >> running binary with: [/tmp/TestEventLogOutputConfiguredViaFleet188574346/001/elastic-agent-8.15.0-SNAPSHOT-linux-x86_64/elastic-agent status --output json]
    fixture.go:657: >> running binary with: [/tmp/TestEventLogOutputConfiguredViaFleet188574346/001/elastic-agent-8.15.0-SNAPSHOT-linux-x86_64/elastic-agent status --output json]
    event_logging_test.go:316: 
            Error Trace:    /home/ubuntu/agent/testing/integration/event_logging_test.go:316
                                        /home/ubuntu/agent/testing/integration/event_logging_test.go:231
            Error:          Condition never satisfied
            Test:           TestEventLogOutputConfiguredViaFleet
            Messages:       could not find event log file
    container_cmd_test.go:98: >> cleaning up: killing the Elastic-Agent process
--- FAIL: TestEventLogOutputConfiguredViaFleet (89.09s)
elasticmachine commented 3 months ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

pierrehilbert commented 3 months ago

This has been introduced by https://github.com/elastic/elastic-agent/pull/4932 @belimawr mentioned before going into PTO that it seems to be working when run alone but to be flaky when run with all the other tests so we probably have some conflicts between tests somewhere

belimawr commented 2 months ago

The error reported here is different from the one fixed by https://github.com/elastic/elastic-agent/pull/5341

ycombinator commented 2 months ago

More recent builds failing on this test + assertion:

Added to list in issue description.

ycombinator commented 2 months ago

~This test is failing quite regularly on PRs so I'm going to disable it now to unblock PRs that have been blocked for a while, e.g. https://github.com/elastic/elastic-agent/pull/5267.~. Never mind, it's skipped already (since yesterday). Some older PRs just need to be rebased on main.

belimawr commented 2 months ago

How to reproduce

(Spoiler alert: the test itself is not flaky, it's just reading the "wrong configuration" :exploding_head:)

  1. Create /usr/share/elastic-agent/state/container-paths.yml containing:
    state_path: /usr/share/elastic-agent/state
    config_path: /usr/share/elastic-agent/state
    socket_path: unix:///usr/share/elastic-agent/state/data/Td8I7R-Zby36_zF_IOd9QVNlFblNEro3.sock
  2. Download the Elastic-Agent and extract the tar.gz
  3. Enter the folder and run the elastic-agent binary as root (root is not required, but it will prevent permission issues): ./elastic-agent
  4. This will cause the logger path to be /usr/share/elastic-agent/state/data/logs
  5. This will make the Elastic-Agent to use the configuration/fleet.enc from /usr/share/elastic-agent/state/ instead of starting with an empty configuration as expected.
belimawr commented 2 months ago

Another way to reproduce the issue is to create /usr/share/elastic-agent/state/container-paths.yml as shown in my last post, then:

  1. Unpack an Elastic-Agent
  2. Create any Policy in Fleet
  3. Enrol the Elastic-Agent in Fleet
  4. Collect the diagnostics
  5. The path map in computed-config.yaml will look like this:
    path:
    config: /usr/share/elastic-agent/state
    data: /usr/share/elastic-agent/state/data
    home: /usr/share/elastic-agent/state/data
    logs: /home/ubuntu/elastic-agent-8.16.0-SNAPSHOT-linux-x86_64

Even though it shows path.logs: /home/ubuntu/elastic-agent-8.16.0-SNAPSHOT-linux-x86_64 the logs are actually stored at /usr/share/elastic-agent/state/data/logs/.

belimawr commented 2 months ago

The same happens if the Elastic-Agent is installed directly the paths is:

path:
    config: /usr/share/elastic-agent/state
    data: /usr/share/elastic-agent/state/data
    home: /usr/share/elastic-agent/state/data
    logs: /opt/Elastic/Agent

But the logs are at /usr/share/elastic-agent/state/data/logs/

VihasMakwana commented 2 months ago

@belimawr I think we should backport this fix as well. I faced the failure in my backport PR

You've already done that. Thanks!

https://buildkite.com/elastic/elastic-agent-extended-testing/builds/2195#0191929b-a7df-495b-b7c4-3fd3d0221093