elastic / elastic-package

elastic-package - Command line tool for developing Elastic Integrations
Other
49 stars 113 forks source link

Flaky: network elastic-package-service_default id 2cc14bcd.... has active endpoints #545

Open mtojek opened 2 years ago

mtojek commented 2 years ago

Spotted in jenkins build:

[2021-10-18T13:05:39.809Z] Removing elastic-package-service_fortinet-fortimail-udp_1      ... 
[2021-10-18T13:05:39.809Z] 
Removing elastic-package-service_fortinet-firewall-udp_1       ... done

Removing elastic-package-service_fortinet-fortimail-udp_1      ... done

Removing elastic-package-service_fortinet-clientendpoint-tcp_1 ... done

Removing elastic-package-service_fortinet-firewall-tcp_1       ... done

Removing elastic-package-service_fortinet-fortimanager-udp_1   ... done

Removing elastic-package-service_fortinet-logfile_1            ... done

Removing elastic-package-service_fortinet-fortimail-tcp_1      ... done

Removing elastic-package-service_fortinet-clientendpoint-udp_1 ... done

Removing elastic-package-service_fortinet-fortimanager-tcp_1   ... done
Removing network elastic-package-service_default
[2021-10-18T13:05:40.070Z] error while removing network: network elastic-package-service_default id 2cc14bcd07612011e08cf9017417ec4bcc0087df931d99b5c15c6762e7b79040 has active endpoints
[2021-10-18T13:05:40.070Z] Error: error running package system tests: could not complete test run: failed to tear down runner: error tearing down service: could not shut down service using Docker Compose: running Docker Compose down command failed: exit status 1
script returned exit code 1
endorama commented 2 years ago

This error showed up in this pipeline: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/main/435/pipeline

Looking at stack dumps logs for azure_billing I found this:

Attaching to elastic-package-stack_elastic-agent_1
elastic-agent_1              | Policy selected for enrollment:  6d5e1830-bbc4-11ec-85d2-d117a4d2dc22
elastic-agent_1              | 2022-04-14T07:28:26.732Z    WARN    [tls]   tlscommon/tls_config.go:98  SSL/TLS verifications disabled.
elastic-agent_1              | 2022-04-14T07:28:27.592Z    INFO    cmd/enroll_cmd.go:432   Starting enrollment to URL: http://fleet-server:8220/
elastic-agent_1              | Successfully enrolled the Elastic Agent.
elastic-agent_1              | 2022-04-14T07:28:28.805Z    INFO    cmd/enroll_cmd.go:245   Elastic Agent might not be running; unable to trigger restart
elastic-agent_1              | Error: could not read configuration file /usr/share/elastic-agent/state/elastic-agent.yml: yaml: line 45: did not find expected key
elastic-agent_1              | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/7.15/fleet-troubleshooting.htm
endorama commented 2 years ago

This error showed up in this pipeline: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/main/433/pipeline/

Looking at stack dump logs for tcp I found:

Attaching to elastic-package-stack_elastic-agent_1
elastic-agent_1              | Policy selected for enrollment:  d60bf480-bbc2-11ec-a07b-d3dcb25dfd7a
elastic-agent_1              | 2022-04-14T07:17:02.090Z    WARN    [tls]   tlscommon/tls_config.go:98  SSL/TLS verifications disabled.
elastic-agent_1              | 2022-04-14T07:17:03.058Z    INFO    cmd/enroll_cmd.go:442   Starting enrollment to URL: http://fleet-server:8220/
elastic-agent_1              | 2022-04-14T07:17:03.850Z    INFO    cmd/enroll_cmd.go:248   Elastic Agent might not be running; unable to trigger restart
elastic-agent_1              | Successfully enrolled the Elastic Agent.
elastic-agent_1              | Error: could not read configuration file /usr/share/elastic-agent/state/elastic-agent.yml: yaml: line 47: could not find expected ':'
elastic-agent_1              | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/7.16/fleet-troubleshooting.html
mtojek commented 2 years ago

There are two issues. First, we don't handle active endpoints correctly. Second, it's a known issue with "line 47": https://github.com/elastic/elastic-agent/issues/98

endorama commented 2 years ago

This error showed up in this pipeline: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/main/432/pipeline/700/

There are multiple packages affected in this PR, with different issues:

mrodm commented 1 year ago

Just found another example of this issue. Added here errors found.

elastic-package test failed while tearing down the scenario

[2023-09-14T08:40:21.300Z] Error: error running package system tests: could not complete test run: failed to tear down runner: error reassigning original policy to agent: could not assign policy to agent; API status code = 500; response body = {"statusCode":500,"error":"Internal Server Error","message":"version_conflict_engine_exception\n\tRoot causes:\n\t\tversion_conflict_engine_exception: [e00933a4-a763-4c70-bedb-859cc77c2429]: version conflict, required seqNo [40], primary term [1]. current document has seqNo [41] and primary term [1]"}

script returned exit code 1

And then, when Jenkins tried to run elastic-package stack down -v step it failed with this error:

[2023-09-14T08:43:09.002Z]  Network elastic-package-stack_default  Removing

[2023-09-14T08:43:09.002Z]  Network elastic-package-stack_default  Error

[2023-09-14T08:43:09.002Z] failed to remove network elastic-package-stack_default: Error response from daemon: error while removing network: network elastic-package-stack_default id 1791f0bb065d570781341e588dfcd6c1e4129670cfd1d7b827d8ca3409027714 has active endpoints

[2023-09-14T08:43:09.002Z] Error: tearing down the stack failed: stopping docker containers failed: running command failed: running Docker Compose down command failed: exit status 1

script returned exit code 1