Open someshsumantwilio opened 2 weeks ago
STM that the spikes in cpu occur during the drain period. If you look at other stats on connections such as total connection and destroyed connections (possibly looking at listener stats that happen earlier: https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/stats) what do you see?
cc @adisuissa as codeowner for http health_check filter.
Below is the metric we see.
Title: Healthcheck draining causing CPU spike
Description:
We tried to analyze the envoy trace log file(envoy_trace.txt) and envoy access log (envoy_access.txt) collected from host which was affected by health check draining but we could not found the root cause of health check draining.
We need help from community to figure why we are having connection draining.
Below are some of dashboard which shows the strong correlation of health check draining causing CPU spike.
Picture1 shows that CPU spike during the same time frame when connection draining is happening. Picture 2 shows that connection draining is happening which is causing CPU spike. Picture 3 shows that active health check connection is dropping due to draining.
We have
Picture1
Picture 2
Picture 3
Below is the healthcheck configuration.
Listener Information
Cluster Information:
[optional Relevant Links:]