elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
112 stars 126 forks source link

[Flaky Test]: TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped – failed to evaluate all symlinks #4929

Open rdner opened 3 weeks ago

rdner commented 3 weeks ago

Failing test case

TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped

Error message

failed to evaluate all symlinks

Build

https://buildkite.com/elastic/elastic-agent-extended-testing/builds/625#0190110f-8604-4125-9789-621c5241ef2b

OS

Linux

Stacktrace and notes

=== RUN   TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped
    logs_ingestion_test.go:489: Making sure metricbeat logs are populated
    logs_ingestion_test.go:493: metricbeat: Got 300 documents
    logs_ingestion_test.go:498: Making sure all components are healthy
    logs_ingestion_test.go:500: 
            Error Trace:    /home/rhel/agent/testing/integration/logs_ingestion_test.go:500
                                        /home/rhel/agent/testing/integration/logs_ingestion_test.go:241
            Error:          Received unexpected error:
                            could not unmarshal agent status output: error: error creating cmd: failed to get control protcol address: failed to evaluate all symlinks of /tmp/TestRpmLogIngestFleetManaged3099371811/001/elastic-agent-8.15.0-SNAPSHOT-x86_64: lstat /tmp/TestRpmLogIngestFleetManaged3099371811/001/elastic-agent-8.15.0-SNAPSHOT-x86_64: no such file or directory, output: 
                            unexpected end of JSON input
            Test:           TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped
            Messages:       could not get agent status to verify all components are healthy
--- FAIL: TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped (15.30s)
elasticmachine commented 3 weeks ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

rdner commented 3 weeks ago

Also happened in:

cmacknz commented 3 weeks ago

The failing line is:

https://github.com/elastic/elastic-agent/blob/e8962b0e4857f1e66d5b4887b4c8fcbb4f1e8af0/pkg/testing/fixture.go#L670-L674

failed to evaluate all symlinks of /tmp/TestRpmLogIngestFleetManaged3099371811/001/elastic-agent-8.15.0-SNAPSHOT-x86_64: no such file or directory

The DEB and RPM tests are the ones that are failing, and this makes some sense, because they do not use any paths under /tmp and that won't be the path to the agent command. For those packages the symlink ends up under /var/lib. https://github.com/elastic/elastic-agent/blob/e8962b0e4857f1e66d5b4887b4c8fcbb4f1e8af0/dev-tools/packaging/packages.yml#L66

I think the DEB and RPM install commands are missing setting up a client at the right socket path. I am actually surprised this failure doesn't happen every time more than anything.

Here is the client getting created on the regular install path:

https://github.com/elastic/elastic-agent/blob/e8962b0e4857f1e66d5b4887b4c8fcbb4f1e8af0/pkg/testing/fixture_install.go#L205-L217

Here is the DEB install which does not set this up, so we fall back to trying to find the control socket in work dir, which is a temporary directory.

https://github.com/elastic/elastic-agent/blob/e8962b0e4857f1e66d5b4887b4c8fcbb4f1e8af0/pkg/testing/fixture_install.go#L435

The DEB and RPM install are also missing the clean up calls to get diagnostics and dump processes. https://github.com/elastic/elastic-agent/blob/e8962b0e4857f1e66d5b4887b4c8fcbb4f1e8af0/pkg/testing/fixture_install.go#L251-L254