With the switch to the new ES output handling, the APM Server changes its behavior when being overloaded. Instead of returning 503 - Queue is full errors, it starts responding much slower to APM agent requests. This causes APM agents to eventually close their connection and log errors. The APM Server itself does not issue any log lines indicating that it is overloaded and doesn't record error metrics. The Stack Monitoring UI doesn't give indicators that the server is overloaded, except for tracking a higher memory usage (because of the requests being buffered in memory).
allow customizing the yaml box for the Elastic Cloud output via Fleet; since 8.0 a dedicated cloud output is configured, avoiding public traffic and any configuration on it is frozen
record metrics indicating that more events are processed than can be ingested to ES; for example track how many available channels are created and when a new channel is available for processing events.
add log warnings events are queued up
add information to Stack Monitoring UI or ship with pre-built monitoring visualizations
With the switch to the new ES output handling, the APM Server changes its behavior when being overloaded. Instead of returning
503 - Queue is full
errors, it starts responding much slower to APM agent requests. This causes APM agents to eventually close their connection and log errors. The APM Server itself does not issue any log lines indicating that it is overloaded and doesn't record error metrics. The Stack Monitoring UI doesn't give indicators that the server is overloaded, except for tracking a higher memory usage (because of the requests being buffered in memory).Parts that should be improved:
10
) (https://github.com/elastic/apm-server/issues/7719).yaml
box for the Elastic Cloud output via Fleet; since8.0
a dedicated cloud output is configured, avoiding public traffic and any configuration on it is frozenavailable
channels are created and when a new channel is available for processing events.