envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.99k stars 4.81k forks source link

The number of envoy connections 65,000+ will make the cpu fully loaded #17907

Closed hbgongen closed 3 years ago

hbgongen commented 3 years ago

I ran into a problem recently.I use the version 1.15.2 release in docker. In the product environment,my cpu will fully loaded when the connection reach to 65000+. I check the handle of linux system,or the port of every connection.They seem to be normal. /etc/sysctl.conf: envoy1 envoy2 /etc/security/limits.conf: envoy3 And the yaml of envoy: envoy4 I don't know if my envoy yaml has some problem. I even try to close the limit of cluster's circuit breaker,when it reached the the number of connections,it also cause the cpu fully loaded.... when connection is 60000,cpu is only 60%,when reached 65000+,it will be highly 2400%(sum cpu core:24). It confused me lot... Please help me check this problem..Thank you.

junr03 commented 3 years ago

@oschaaf or @jmarantz could probably have some pointers

rojkov commented 3 years ago

Could you make some flamegraphs for your setup?

What's the output of sysctl kernel.msgmni in your system?

PS. it smells like some counter gets overflowed after 65535.

hbgongen commented 3 years ago

thanks,i will try to get its flamegrahs. kernel.msgmni is 15885. @rojkov

hbgongen commented 3 years ago

when i change msgmni to 70000,it also has fully cpu loaded when the tcpconnection reach 65000+. @rojkov

hbgongen commented 3 years ago

image i find this problem,cpu all for the kernel function connect. Maybe Before we connect,shall we bind a port in case of the cost of selecting unused port.

rojkov commented 3 years ago

This looks like Envoy is trying to open 65000+ connections to the same single upstream and running out of local ports (65536 connections is a hard limit here). Is that the case?

hbgongen commented 3 years ago

Yes...Sorry..Ahahahah,i have find this problem.single listener port to same upstream is only 60000+ because of the ip port range...i ignore that envoy will create a local ip to upstream.