facebookincubator / katran

A high performance layer 4 load balancer
GNU General Public License v2.0
4.76k stars 504 forks source link

Katran didn't route pkg to default gateway #145

Closed bienkma closed 2 years ago

bienkma commented 2 years ago

Hi there, I got an issues when use Katran for my services that was some packet (about 1%) did not route to default gateway. Sometimes client got an error messages "timeout connection to VIP, the timeout > 700ms". The service worked fine in < 10k pkts, the situation was only happened when LB reached over 10k pkts (client send request with UUID token debug, in the timeout request I can't lookup the request in real server). How can I check the problem? Can I use katran_server_grpc to setup a LB in production environment?

bienkma commented 2 years ago

I've span port switch and capture the traffic on katran interface. It look like some packets are big length and have to retransmission. The problem is miss config MTU? image

nikhildl12 commented 2 years ago

@bienkma: what is the packet size used in the test? Also do you see the xdp drop counter going up on the katran host using:

ethtool -S <interface> | grep rx_xdp_drop
bienkma commented 2 years ago

@nikhildl12 There was not testing environment that's real request from my clients. Some times we captured TCP packet segment with length 8000 and jumbo frames in the switch device. There did not have the rx_xdp_drop option. This was out put the command when I tried to get somethings dropped in the katran interface:

ethtool -S eno1 | grep drop
     rx_dropped: 0
     tx_dropped: 0
     port.rx_dropped: 0
     port.tx_dropped_link_down: 0
bienkma commented 2 years ago

Look like this issue https://github.com/facebookincubator/katran/issues/82

bienkma commented 2 years ago

The problem is solved. I added mss 1400 option on haproxy (LB layer7).