Open eyalkraft opened 2 months ago
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
I don't think we can just exit the controller, I think we have to send an empty variables update.
The coordinator is watching for variable updates and that triggers a component model update:
It seems like we ignore configuration changes until we get at least one variables update which seems to be what is causing this:
@cmacknz Yes this sounds reasonable 👍
I didn't mean we should exit the controller with
A possible solution could be based on the providers config length or the waiting group which should return immediately in this case since if there are no providers it is equal to 0.
What I meant was that wg.Wait()
wouldn't block but instead return immediately.
Unfortunately you can't
select {
case <- wg.Wait():
But along the lines of what you suggest we could do something like
if len(c.contextProviders) + len(c.dynamicProviders) == 0 {
// no providers, fake a state change to trigger the initial update
stateChangedChan <- true
}
before the debounce logic.
👍 We have this in our queue to fix sometime in the next month since it seems like you aren't urgently blocked on this.
If that doesn't work, or you or your team want to try fixing this yourselves, let us know.
In case of no enabled providers, the elastic agent stalls forever. It seems to be a bug here: https://github.com/elastic/elastic-agent/blob/0c7212f2d92021a9e008de4abe362d0c77f78638/internal/pkg/composable/controller.go#L188-L203 where there is no way to break to
DEBOUNCE
if no provider is updating.A possible solution could be based on the providers config length or the waiting group which should return immediately in this case since if there are no providers it is equal to 0. https://github.com/elastic/elastic-agent/blob/0c7212f2d92021a9e008de4abe362d0c77f78638/internal/pkg/composable/controller.go#L128
The bug was discovered as part of the work on agentless controller. Currently we use a workaround to solve this issue.
Bug details:
Setup (
agent-bug
is the directory name which shows up before every command)Get the original config file
Modify it to disable all providers
Start the agent with the modified config
Enter the container
Agent stuck on waiting for initial configuration
... waiting ...
Agent configuration (for some reason inspect stalls so I have to kill it)
Cleanup