Open jsoriano opened 1 year ago
@jsoriano we're not sure if this is needed in the default instance, but we were having some issues with the docker setup yesterday and found that the package-registry instance needed this added to its docker config in-order to come online:
security_opt:
- seccomp:unconfined
Otherwise we received an immediate error:
runtime/cgo: pthread_create failed: Operation not permitted
I actually wanted to create this issue in elastic-agent
:facepalm: Moving it.
@jsoriano we're not sure if this is needed in the default instance, but we were having some issues with the docker setup yesterday and found that the package-registry instance needed this added to its docker config in-order to come online:
security_opt: - seccomp:unconfined
Otherwise we received an immediate error:
runtime/cgo: pthread_create failed: Operation not permitted
I don't think this is related to this issue. I haven't seen this kind of problems with package-registry before. @joeperuzzi could you please create an issue in https://github.com/elastic/elastic-package repository with information about your environment and the version of elastic-package that you are using?
Both elastic-agent and elastic-agent-complete images are still having this issue. What is the workaround for this if I am using docker-compose to provision the fleet-server (elastic-agent)?
On
elastic-package stack up
, we have identified a couple of issues that when combined can lead to unreliable provisioning of automated scenarios involving Fleet Server and Agent. These issues are:This may not be new, but we have identified it as much more frequent in 8.6.
Where is this an issue?
Any automation that waits for Fleet Server to be healthy before starting to enroll Agents can find that first enrollments fail because Fleet Server is not available. This can be reproduced with
elastic-package stack up
(up to elasitc-package 0.72.0), where the following happens, orchestrated bydocker-compose
:elastic-agent status
to be healthy.The issue is that between steps 4 and 5, after Fleet Server has reported to be healthy, it goes back to an state where it cannot accept connections, so step 5 fails and the process is aborted.
This can be seen on this build for example: https://fleet-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Felastic-package/detail/PR-1118/6/pipeline
Fleet Server goes healthy, and then it seems to be restarted or reconfigured (multiple times?):
Elastic Agent immediately fails, and exits:
Full logs here:
Proposed changes
Workaround
Restart
elastic-agent
if it fails during enrollment. This change has been applied to elastic-package starting on 0.73.0 (https://github.com/elastic/elastic-package/pull/1118).