elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
110 stars 126 forks source link

[Flaky Test]: Serverless `TestLogIngestionFleetManaged` – error starting service #4686

Open ycombinator opened 1 month ago

ycombinator commented 1 month ago

Failing test case

TestLogIngestionFleetManaged

Error message

error starting service: failed to start service (Elastic Agent): The service did not respond to the start or control request in a timely fashion.

Build

https://buildkite.com/elastic/elastic-agent/builds/8717

OS

Windows

Stacktrace and notes

>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): === FAIL: testing/integration TestLogIngestionFleetManaged (18.91s)
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): logs_ingestion_test.go:59: Enrolling agent in Fleet with a test policy
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): logs_ingestion_test.go:83: Creating enrollment API key...
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): logs_ingestion_test.go:83: Unpacking and installing Elastic Agent
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): fixture_install.go:110: [test TestLogIngestionFleetManaged] Inside fixture install function
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): fixture_install.go:136: [test TestLogIngestionFleetManaged] Inside fixture installNoPkgManager function
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): fixture.go:281: Extracting artifact elastic-agent-8.14.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestLogIngestionFleetManaged3752115229\001
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): fixture.go:299: Completed extraction of artifact elastic-agent-8.14.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestLogIngestionFleetManaged3752115229\001
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): fixture.go:900: Components were not modified from the fetched artifact
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): fixture.go:656: >> running binary with: [C:\Users\windows\AppData\Local\Temp\TestLogIngestionFleetManaged3752115229\001\elastic-agent-8.14.0-SNAPSHOT-windows-x86_64\elastic-agent.exe install --force --non-interactive --unprivileged --url https://f91bd2b209004173bb4772d3a8bc8d1e.fleet.us-east-1.aws.elastic.cloud:443 --enrollment-token UkM5NlQ0OEI3b01Ob3hoRFliclg6OEJMNmRTdzNUUFdZS01YUU9pWTA5dw==]
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): logs_ingestion_test.go:83: Unprivileged installation mode enabled; this is an experimental and currently unsupported feature.
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): Installing in non-interactive mode.
[   =] Start Service failed, exiting...  [6s] Installation failed to start Elastic Agent service.[0s]
[=   ] Uninstalled  [7s] Error uninstalling. Printing logs: [   =] Uninstalling  [6s]
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install]   Loaded configuration from C:\Users\windows\AppData\Local\Temp\TestLogIngestionFleetManaged3752115229\001\elastic-agent-8.14.0-SNAPSHOT-windows-x86_64\elastic-agent.yml
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install]   Merged configuration from C:\Users\windows\AppData\Local\Temp\TestLogIngestionFleetManaged3752115229\001\elastic-agent-8.14.0-SNAPSHOT-windows-x86_64\elastic-agent.yml into result
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install]   Merged all configuration files from [C:\Users\windows\AppData\Local\Temp\TestLogIngestionFleetManaged3752115229\001\elastic-agent-8.14.0-SNAPSHOT-windows-x86_64\elastic-agent.yml], no external input files
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install.composable]    Starting controller for composable inputs
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install.composable]    Started controller for composable inputs
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install.composable]    Variable state changed for composable inputs; debounce started
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install.composable.providers.kubernetes]   Kubernetes provider for resource pod skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    INFO    [install.composable.providers.docker]   Docker provider skipped, unable to connect: protocol not available
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install.composable.providers.kubernetes]   Kubernetes provider for resource node skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install.composable]    kubernetes_secrets provider skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.453Z    DEBUG   [install.composable]    Kubernetes leaderelection provider skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.559Z    DEBUG   [install.composable]    Computing new variable state for composable inputs
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.559Z    DEBUG   [install.composable]    Stopping controller for composable inputs
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): 2024-05-06T19:56:27.654Z    DEBUG   [install.composable]    Stopped controller for composable inputs
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): Error: error starting service: failed to start service (Elastic Agent): The service did not respond to the start or control request in a timely fashion.
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.14/fleet-troubleshooting.html
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout):
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): logs_ingestion_test.go:90:
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): Error Trace:    C:/Users/windows/agent/testing/integration/logs_ingestion_test.go:90
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): Error:          Received unexpected error:
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): unable to enroll Elastic Agent: error running agent install command: exit status 1
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): Test:           TestLogIngestionFleetManaged
>>> (windows-amd64-2022-fleet) Test output (sudo) (stdout): DONE 1 tests, 1 failure in 77.558s
>>> (windows-amd64-2022-fleet) Test output (sudo) (stderr): Error: go test returned a non-zero value: exit status 1
>>> (windows-amd64-2022-fleet) sudo tests failed: Process exited with status 1
>>> Testing completed (1 failures, 6 successful)
elasticmachine commented 1 month ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

elasticmachine commented 1 month ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

cmacknz commented 1 month ago

This failure happened in a few tests here, including the uninstall stress test. FYI @leehinman in case you have any idea of what might be happening here.

rdner commented 2 weeks ago

I saw a failure in the uninstall stress test that might be related to this issue:

failed to set user elastic-agent-user password for service: call to NetUserSetInfo failed: status=2245 error=4294967295

https://github.com/elastic/elastic-agent/issues/4891

cmacknz commented 2 weeks ago

https://learn.microsoft.com/en-us/troubleshoot/windows-server/remote/terminal-server-error-messages-2200-to-2299#error-2245

Error 2245 The password is shorter than required.

Explanation: The password you specified isn't long enough.

Action: Use a longer password. See your network administrator to find the required length for passwords on your system.

Error 4294967295 is a weird one though, that's the max value of a uint32 (0xffffffff)