Open taylorchu opened 10 months ago
If we have a high rate of /send_metrics requests, both /metrics and /ping will time out. Because /ping is used for liveness check, the pod will be killed, and we end up dropping metrics. Initially, I thought it was related to webrick https://github.com/discourse/prometheus_exporter/issues/146, but it is more likely to be this global mutex https://github.com/discourse/prometheus_exporter/blob/239e2c60f93ecbb67e5701e3abb670f1a2783e5f/lib/prometheus_exporter/server/collector.rb#L10
We have about ~800 metrics, but the remote /send_metrics is about ~1000/s.
If we have a high rate of /send_metrics requests, both /metrics and /ping will time out. Because /ping is used for liveness check, the pod will be killed, and we end up dropping metrics. Initially, I thought it was related to webrick https://github.com/discourse/prometheus_exporter/issues/146, but it is more likely to be this global mutex https://github.com/discourse/prometheus_exporter/blob/239e2c60f93ecbb67e5701e3abb670f1a2783e5f/lib/prometheus_exporter/server/collector.rb#L10
We have about ~800 metrics, but the remote /send_metrics is about ~1000/s.