Open mtojek opened 3 years ago
This error showed up in this pipeline: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/main/435/pipeline
Looking at stack dumps logs for azure_billing
I found this:
Attaching to elastic-package-stack_elastic-agent_1
[36melastic-agent_1 |[0m Policy selected for enrollment: 6d5e1830-bbc4-11ec-85d2-d117a4d2dc22
[36melastic-agent_1 |[0m 2022-04-14T07:28:26.732Z WARN [tls] tlscommon/tls_config.go:98 SSL/TLS verifications disabled.
[36melastic-agent_1 |[0m 2022-04-14T07:28:27.592Z INFO cmd/enroll_cmd.go:432 Starting enrollment to URL: http://fleet-server:8220/
[36melastic-agent_1 |[0m Successfully enrolled the Elastic Agent.
[36melastic-agent_1 |[0m 2022-04-14T07:28:28.805Z INFO cmd/enroll_cmd.go:245 Elastic Agent might not be running; unable to trigger restart
[36melastic-agent_1 |[0m Error: could not read configuration file /usr/share/elastic-agent/state/elastic-agent.yml: yaml: line 45: did not find expected key
[36melastic-agent_1 |[0m For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/7.15/fleet-troubleshooting.htm
This error showed up in this pipeline: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/main/433/pipeline/
Looking at stack dump logs for tcp
I found:
Attaching to elastic-package-stack_elastic-agent_1
[36melastic-agent_1 |[0m Policy selected for enrollment: d60bf480-bbc2-11ec-a07b-d3dcb25dfd7a
[36melastic-agent_1 |[0m 2022-04-14T07:17:02.090Z WARN [tls] tlscommon/tls_config.go:98 SSL/TLS verifications disabled.
[36melastic-agent_1 |[0m 2022-04-14T07:17:03.058Z INFO cmd/enroll_cmd.go:442 Starting enrollment to URL: http://fleet-server:8220/
[36melastic-agent_1 |[0m 2022-04-14T07:17:03.850Z INFO cmd/enroll_cmd.go:248 Elastic Agent might not be running; unable to trigger restart
[36melastic-agent_1 |[0m Successfully enrolled the Elastic Agent.
[36melastic-agent_1 |[0m Error: could not read configuration file /usr/share/elastic-agent/state/elastic-agent.yml: yaml: line 47: could not find expected ':'
[36melastic-agent_1 |[0m For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/7.16/fleet-troubleshooting.html
There are two issues. First, we don't handle active endpoints correctly. Second, it's a known issue with "line 47": https://github.com/elastic/elastic-agent/issues/98
This error showed up in this pipeline: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/main/432/pipeline/700/
There are multiple packages affected in this PR, with different issues:
cisco_nexus
, cockroachdb
, mysql_enterprise
with same issue as https://github.com/elastic/elastic-agent/issues/98linux
, zeek
(reporting logs containing error
):
...
[36mfleet-server_1 |[0m Kibana Fleet setup failed: http POST request to http://kibana:5601/api/fleet/setup fails: Forbidden: <nil>. Response: {"statusCode":403,"error":"Forbidden","message":"Forbidden"}
...
[36mfleet-server_1 |[0m {"log.level":"error","@timestamp":"2022-04-14T03:33:02.884Z","log.origin":{"file.name":"process/app.go","file.line":290},"message":"failed to stop fleet-server: os: process already finished","ecs.version":"1.6.0"}
- agent (repeated multiple times)
... [36melastic-agent_1 |[0m {"log.level":"error","@timestamp":"2022-04-14T03:36:27.415Z","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'error'","ecs.version":"1.6.0"} [36melastic-agent_1 |[0m {"log.level":"error","@timestamp":"2022-04-14T03:36:27.415Z","log.origin":{"file.name":"process/app.go","file.line":158},"message":"failed to stop after 30s: application stopping timed out","ecs.version":"1.6.0"} [36melastic-agent_1 |[0m {"log.level":"error","@timestamp":"2022-04-14T03:36:27.415Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-04-14T03:36:27Z - message: Application: filebeat--8.2.0-SNAPSHOT--36643631373035623733363936343635[1667eb23-d647-42ca-b3bd-978f0f0aa80c]: State changed to FAILED: failed to stop after 30s: application stopping timed out - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"} ...
Just found another example of this issue. Added here errors found.
elastic-package test failed while tearing down the scenario
[2023-09-14T08:40:21.300Z] Error: error running package system tests: could not complete test run: failed to tear down runner: error reassigning original policy to agent: could not assign policy to agent; API status code = 500; response body = {"statusCode":500,"error":"Internal Server Error","message":"version_conflict_engine_exception\n\tRoot causes:\n\t\tversion_conflict_engine_exception: [e00933a4-a763-4c70-bedb-859cc77c2429]: version conflict, required seqNo [40], primary term [1]. current document has seqNo [41] and primary term [1]"}
script returned exit code 1
And then, when Jenkins tried to run elastic-package stack down -v
step it failed with this error:
[2023-09-14T08:43:09.002Z] Network elastic-package-stack_default Removing
[2023-09-14T08:43:09.002Z] Network elastic-package-stack_default Error
[2023-09-14T08:43:09.002Z] failed to remove network elastic-package-stack_default: Error response from daemon: error while removing network: network elastic-package-stack_default id 1791f0bb065d570781341e588dfcd6c1e4129670cfd1d7b827d8ca3409027714 has active endpoints
[2023-09-14T08:43:09.002Z] Error: tearing down the stack failed: stopping docker containers failed: running command failed: running Docker Compose down command failed: exit status 1
script returned exit code 1
Spotted in jenkins build: