We should explore optimizations to this API such as:
First trying any low hanging fruit, such as:
Removing the sort property when the perPage is 0 (we don't need the results). This may already be optimized by ES though.
Removing the track_total_hits and rest_total_hits_as_int fields since we're already paging through all of the results.
Doing a single query for agents and then do the bucketing client side in Kibana
Using a runtime field (or potentially indexed) to calculate the status and then do a single terms aggregation on this field rather than separate queries for each status.
We should benchmark these optimizations to ensure they improve performance, especially when there are many agents running (10k+) and some sort of bulk action is happening at the same time (such as a policy change).
We currently do a separate query for each agent status against the agent index to produce the counts that we report on the
/agents/agent_status
API. When this index is under a lot of ingest load, this can become quite expensive and slow: https://github.com/elastic/kibana/blob/bef0d6a8d37b633d22fe1896005f47cef547f3b0/x-pack/plugins/fleet/server/services/agents/status.ts#L78-L104We should explore optimizations to this API such as:
sort
property when theperPage
is 0 (we don't need the results). This may already be optimized by ES though.track_total_hits
andrest_total_hits_as_int
fields since we're already paging through all of the results.We should benchmark these optimizations to ensure they improve performance, especially when there are many agents running (10k+) and some sort of bulk action is happening at the same time (such as a policy change).