Open kpollich opened 9 months ago
One issue with using a mock agent is that there have been a few cases where some of the behaviour of the mock differs from the actual agent (iirc this last happened specifically around scheduled actions).
I've disabled all e2e tests that used the snapshot agent in https://github.com/elastic/fleet-server/pull/3293 as it causes issues whenever we do a version bump. It's something to be aware of (we may need an alternate way to get the latest snapshot agent build) if we want to re-enable some of these tests or use agents in whatever other test suite we build
Elastic Agent includes a powerful suite of integration tests that spin up Fleet Server and Kibana from snapshot builds to test functionality in a true-to-life environment. We should take cues from this setup and add a similar testing capability to Fleet Server.
References
One instance where tests like this could've helped us is https://github.com/elastic/fleet-server/issues/3263.
These tests would ideally allow us to place "mock" agents (similar to https://github.com/elastic/horde) into broken or erroneous states intentionally, then run those agents through various APIs and lifecycles to ensure they recover and are placed into a manageable state. Tests like these would allow us to make strides in our fault tolerance and self-healing capabilities around Fleet and Agent.
Alternatives