Open nev888 opened 1 month ago
cc @mattklein123 @danzh2010
Please adjust your listen socket's receive buffer size; https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/http/http3#downstream-stats
Thanks @danzh2010, If kernel’s UDP listen socket’s receive buffer isn’t large enough, Is it possible it causes other issues? We came across this counter value while investigating a memory leak.
Thanks @danzh2010, If kernel’s UDP listen socket’s receive buffer isn’t large enough, Is it possible it causes other issues? We came across this counter value while investigating a memory leak.
It will become bandwidth limitation, but not memory leak.
Any recommendation how big it should be? currently it's ~9.5mb.
Any recommendation how big it should be? currently it's ~9.5mb.
This depends on your bandwidth, and the Linux kernel doubles the number your supplied via setsockopt.
Also please keep in mind that the stats is accumulative.
One more question, Do you have any idea why the counter number is not in sync with traffic rate?
First measurement is 68763983128, few mins later it became 210235537380.
I had running traffic for few hours, traffic rate were ~40 call/sec. I have generated truncated UDP traffic for few mins too but not even close to those numbers I see in the counter.
All the generated traffic could have reached ~2million max, and not all of it were UDP, only half of it.
I don't know the ingress rate of your service. Assuming it's 5min range, you have ~400M packet drops per sec. You can check netstat -p udp
to confirm the numbers are consistent with what kernel sees.
The previous stat were from a Pod which is not running anymore.
I have a different pod with these stats:
Traffic were generated with a script, the rate was a rough estimate,
Packets per second: 38873
The downstream_rx_datagram_dropped jumped to that big number from 0, the script were running for 1.5hour
listener.IPv4_PORT.udp.downstream_rx_datagram_dropped: 61610884148229
bash-4.4$ nstat IpInReceives 3021072 IpInDelivers 3021072 IpOutRequests 3020920 TcpActiveOpens 17 TcpPassiveOpens 93 TcpEstabResets 7 TcpInSegs 1023 TcpOutSegs 871 TcpOutRsts 32 UdpInDatagrams 690 UdpInErrors 3019359 UdpOutDatagrams 690 UdpInCsumErrors 3019359 TcpExtTCPHPHits 174 TcpExtTCPPureAcks 203 TcpExtTCPHPAcks 317 TcpExtTCPAbortOnData 10 TcpExtTCPAbortOnClose 7 TcpExtTCPRcvCoalesce 9 TcpExtTCPOrigDataSent 437 TcpExtTCPDelivered 454 IpExtInOctets 779239070 IpExtOutOctets 779149461 IpExtInNoECTPkts 3021072
nstat only shows incremented values since last run, please use 'nstat -a'. And I didn't see the result had dropped packets count. Can you use netstat -p udp
?
Which UDP extension are you using?
nstat only shows incremented values since last run, please use 'nstat -a'. And I didn't see the result had dropped packets count. Can you use
netstat -p udp
?
Tue Jul 16 07:39:46 CEST 2024 listener.IPv4:Port.udp.downstream_rx_datagram_dropped: 89237114699
nstat-2024-07-16_07-39.txt
I don't have net netstat, I can use ss though. udp-sockets-2024-07-16_10-59.txt
Which UDP extension are you using?
We don't use any UDP extension.
Which UDP extension are you using?
We don't use any UDP extension.
Are you using UDP Proxy?
No,
Can you share your UDP listener config?
Here the config for the listener udp_listener.txt
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
If you are using raw UDP, why do you need this?
"connection_balance_config": {
"exact_balance": {}
},
This config is intended for TCP listeners. On the controller side both UDP and TCP are configured with the same config that's why we have this here.
UDP listener other than QUIC is connectionless, you probably don't need that.
Yep, in UDP case we have no use for it. Do you think this might have anything to do with the counters problem?
Not sure. I'm not familiar with raw UDP listener interaction with connection_balance_config. It is the cause, you may see a warning log about packet being dropped in only some of threads (not all) in Envoy log.
I see a similar problem with a set up to test UDP in envoy. the dropped datagrams are in the tens of billions per second while the in traffic is just a couple hundred thousands, could be a counter issue? any idea if it's from envoy collecting the metrics or deeper down?
listener..udp.downstream_rx_datagram_dropped shows big number and jumps up with really big numbers when running UDP traffic.
Collecting the stats during few minutes:
listener..udp.downstream_rx_datagram_dropped: 68763983128
listener..udp.downstream_rx_datagram_dropped: 210235537380
listener..udp.downstream_rx_datagram_dropped: 210235537380
listener..udp.downstream_rx_datagram_dropped: 210235537380
listener..udp.downstream_rx_datagram_dropped: 211333341100
listener..udp.downstream_rx_datagram_dropped: 212764708258
listener..udp.downstream_rx_datagram_dropped: 215293879136
listener..udp.downstream_rx_datagram_dropped: 215973672978 (last one after stopping udp traffic)
listener..udp.downstream_rx_datagram_dropped: 215973672978
listener..udp.downstream_rx_datagram_dropped: 215973672978
listener..udp.downstream_rx_datagram_dropped: 215973672978
If the counter displays number of datagrams dropped for a specific listener? These numbers don't look realistic from traffic perspective. There are not that many traffic at all.
We are using envoy as L7 load balancer for sip traffic. On client side (downstream) traffic is received on TCP/UDP, traffic is load balanced to the application Pod (upstream) over GRPC.
stats.txt server_info.txt clsuters.txt
Envoy code is extended with our own for the specific use case we have.