Closed vinniemo closed 1 year ago
Hi @vinniemo, thanks for reporting this, I think we might lack some information about your environment.
I am not surprised since the traffic path among Loadbalancer, Kong, and upstream could be keep-alived. Please put more information there for us to investigate.
I would suggest turning off the HTTP/2 listening and see if still see this?
HTTP/2 is designed to have a single long-lived TCP connection, across which all requests are multiplexed—meaning multiple requests can be active on the same connection at any point in time. Normally, this is great, as it reduces the overhead of connection management. However, it also means that (as you might imagine) connection-level balancing isn't very useful. Once the connection is established, there's no more balancing to be done.
See also:
@mayocream the lj_BC_TGETS
has a large part of flame graph,we don`t have much information about it, is that normal?
Possibly related to the information mentioned in #8838
@mayocream the
lj_BC_TGETS
has a large part of flame graph,we don`t have much information about it, is that normal?
Hi @LoremipsumSharp , you provided a C-land flamegraph. Kong is written in Lua, so we should use the Lua-land flamegraph to analyze the performance of Kong. Now, I don't think we need the flamegraph to analyze this issue.
I don't think the imbalance is a Kong issue though, it's most likely related to Kubernetes.
@mayocream the
lj_BC_TGETS
has a large part of flame graph,we don`t have much information about it, is that normal?Hi @LoremipsumSharp , you provided a C-land flamegraph. Kong is written in Lua, so we should use the Lua-land flamegraph to analyze the performance of Kong. Now, I don't think we need the flamegraph to analyze this issue.
I don't think the imbalance is a Kong issue though, it's most likely related to Kubernetes. @ADD-SP Is there any tool for generating the Lua-land flamegraph?
@LoremipsumSharp you can use this one https://github.com/kong/stapxx#lj-lua-bt
I don't think we need to use the flamegraph to analyze this issue, I think this is a Kubernetes-related issue.
@LoremipsumSharp Would you mind telling us the Kubernetes config mentioned by @mayocream ?
@LoremipsumSharp you can use this one https://github.com/kong/stapxx#lj-lua-bt
Thank you for your advice . You can see it in this lj-lua-bt svg picture, The CPU is mostly dealing with regex router match. For this,Do you have any better suggestions?
The problem has been resolved. The root cause is that short path prefix routes are placed at the lower part of the table during sort. The pcre regex routes are sorted to the first position.
The complexity of each request short path prefix route in find_route function is O(N)=2500+, So CPU usage is particularly high and imbalance.
This problem can be solved temporarily by setting the short path prefix route as a pcre regex route and making it regex priority to a higher level. For example, /api/ to /api/(\S+) . But the best way to deal with it is find_route function is O(N)=1.
@vinniemo Thanks for the investigation. Based on the current implementation of Router, I think it is hard to reduce the time complexity of find_route
to O(1).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Is there an existing issue for this?
Kong version (
$ kong version
)Kong 2.7.1 pg 9.6
Current Behavior
Deployed on K8S,Kong server cluster instance 30 pod(4C4G), Kong service 125, Kong route 2500+(PCRE Regex route 95%, Short path prefix 5%), Total QPS 45000+
Expected Behavior
Kong Server Node CPU usage imbalance,It's getting worse with traffic growth,Until the CPU fails to respond
Steps To Reproduce
Anything else?
No response