Today Elastic Agent will unenroll itself automatically after receiving 7 consecutive 401 responses from Fleet when checking in. This was done to prevent agents that have been forced unenrolled (which revokes their API key) from checking in continuously until they can be re-installed.
This prevents force unenrolled agents from continuing to contact Fleet Server, but represents an edge case that can be hit in disaster recovery situations. To eliminate the chance that users recovering their cluster need to manually intervene on machines, we should stop unenrolling and instead greatly increase the checkin interval.
The initial proposal is that instead of unenrolling, we should switch to checking in once per hour. A successful checkin must return the agent to its original checkin interval.
Today Elastic Agent will unenroll itself automatically after receiving 7 consecutive 401 responses from Fleet when checking in. This was done to prevent agents that have been forced unenrolled (which revokes their API key) from checking in continuously until they can be re-installed.
https://github.com/elastic/elastic-agent/blob/590c506aea6f278200d024e65d0bc7e1c8b5238a/internal/pkg/agent/application/gateway/fleet/fleet_gateway.go#L26-L28
https://github.com/elastic/elastic-agent/blob/590c506aea6f278200d024e65d0bc7e1c8b5238a/internal/pkg/agent/application/gateway/fleet/fleet_gateway.go#L360-L363
https://github.com/elastic/elastic-agent/blob/590c506aea6f278200d024e65d0bc7e1c8b5238a/internal/pkg/agent/application/gateway/fleet/fleet_gateway.go#L329-L341
This prevents force unenrolled agents from continuing to contact Fleet Server, but represents an edge case that can be hit in disaster recovery situations. To eliminate the chance that users recovering their cluster need to manually intervene on machines, we should stop unenrolling and instead greatly increase the checkin interval.
The initial proposal is that instead of unenrolling, we should switch to checking in once per hour. A successful checkin must return the agent to its original checkin interval.