elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
124 stars 133 forks source link

[Meta] Audit concurrency handling / increase unit test coverage in Agent #3040

Open faec opened 1 year ago

faec commented 1 year ago

This is a meta-issue for the work that started in https://github.com/elastic/elastic-agent/pull/2736, https://github.com/elastic/elastic-agent/pull/2849 and https://github.com/elastic/ingest-dev/issues/1936. The overall goal is to identify and address risky concurrency patterns in the Agent codebase, and to increase reliability and test coverage by migrating to more testable internal API patterns.

Initial work focused on Coordinator and its state handling, and as a result we have greatly reduced the amount of locking needed within Coordinator, and simplified the testing model while still covering many more scenarios (https://github.com/elastic/elastic-agent/issues/2868). This has both increased short-term reliability and clarified issues with Agent's current state handling that require followup (https://github.com/elastic/elastic-agent/issues/2789, https://github.com/elastic/elastic-agent/issues/2852, https://github.com/elastic/elastic-agent/issues/2887).

The intention now is to similarly audit other components, starting with the most urgent/sensitive and proceeding as time and triage allows. We particularly want to focus on:

Components that still need to be examined:

elasticmachine commented 1 year ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)