Closed zeeshen closed 4 years ago
Just a very brief note, using 8 workers on 4 cores isn't helping anything, as that will just increase lock contention in the critical path :)
That said, yes, we've discussed recently a few was to optimize some parts of the design. A brief few questions- how many Services are defined in your Kong cluster, and what does your throughput look like?
using 8 workers on 4 cores isn't helping anything
Yes, we've updated this config.
how many Services are defined
283
what does your throughput look like
2k+ QPS for a single kong node at peak times. And each kong node is running on an AWS m4.xlarge instance(4 core 16GB).
Thanks for the update!
Would a simple workaround to introduce a config on prometheus config to select which metrics to collect? Let's say I'm not interested in latency, just http status and bandwidth consumption: we cut by half the number of access to the shared dict to store stats on every request.
Would a simple workaround to introduce a config on prometheus config to select which metrics to collect? Let's say I'm not interested in latency, just http status and bandwidth consumption: we cut by half the number of access to the shared dict to store stats on every request.
Certainly, this requires some changes to the backing Prometheus library that we are using but is a feature that we would like to support.
Another solution that will greatly improve the plugin's performance is by storing all the metrics at a worker level and sync those periodically into the shared dict, which will greatly reduce the contention due to locks.
Prometheus plugin (global enabled) uses too many CPU. (4 cores, 8 workers)
With
perf
, we can find thatngx_http_lua_ffi_shdict_incr
(10.9%) occupied more CPU than nginx main(7.8%). Same server, same traffic, with prometheus plugin disabled, nginx master 30%:The heavy CPU load might be related with heavy usage of
shared.dict.incr
, at least 4 calls (http_status, latency_sum, latency_count, latency_bucket) per request. Meanwhile,shared.dict.incr
is implemented byngx_shmtx_lock
, underlying a spin lock. ngx_http_lua_ffi_shdict_incr ngx_shmtx_lockMaybe writing shared dict on every request is not a good choice as monitoring using more CPU than main processor? Is that ok if every worker keep metrics in its own memory, and set a timer flush metrics into shared dict periodically? The lack of "real-time" might be acceptable as prometheus scape interval is offen set in seconds.