facebookincubator / katran

A high performance layer 4 load balancer
GNU General Public License v2.0
4.74k stars 503 forks source link

Real Server doesn't send response to user #241

Open Bit-Warrior-X opened 5 days ago

Bit-Warrior-X commented 5 days ago

Hello everyone

I have installed Katran Load balancer and used example_grpc for testing. This is the topology what I have used. top

I have configured Katran like below:

# ./katran_goclient -l
2024/10/30 13:34:27 vips len 1
VIP:             31.3.7.2 Port:      0 Protocol: tcp
Vip's flags: 
 ->31.3.7.150        weight: 1 flags: 
exiting

I am working on user side (31.3.7.140) and trying to connect to backend real server using ssh command ssh 31.3.7.2

I have noticed that the syn packet is arrived correct on backend server side

#tcpdump -ni eth0 proto 4 -vvv or host 31.3.7.2
17:45:19.195459 IP (tos 0x0, ttl 63, id 0, offset 0, flags [none], proto IPIP (4), length 72)
    172.16.143.187 > 31.3.7.150: IP (tos 0x0, ttl 64, id 34583, offset 0, flags [DF], proto TCP (6), length 52)
    31.3.7.140.14216 > 31.3.7.2.ssh: Flags [S], cksum 0x28bf (correct), seq 3802326149, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 14], length 0
17:45:20.202682 IP (tos 0x0, ttl 63, id 0, offset 0, flags [none], proto IPIP (4), length 72)
    172.16.143.187 > 31.3.7.150: IP (tos 0x0, ttl 64, id 34584, offset 0, flags [DF], proto TCP (6), length 52)
    31.3.7.140.14216 > 31.3.7.2.ssh: Flags [S], cksum 0x28bf (correct), seq 3802326149, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 14], length 0
17:45:22.250802 IP (tos 0x0, ttl 63, id 0, offset 0, flags [none], proto IPIP (4), length 72)
    172.16.143.187 > 31.3.7.150: IP (tos 0x0, ttl 64, id 34585, offset 0, flags [DF], proto TCP (6), length 52)
    31.3.7.140.14216 > 31.3.7.2.ssh: Flags [S], cksum 0x28bf (correct), seq 3802326149, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 14], length 0

And this is packet captued on user side (31.3.7.140)

# tcpdump -ni eth0 host 31.3.7.2 -vvv             
dropped privs to tcpdump
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:45:19.195377 IP (tos 0x0, ttl 64, id 34583, offset 0, flags [DF], proto TCP (6), length 52)
    31.3.7.140.14216 > 31.3.7.2.ssh: Flags [S], cksum 0x4cba (incorrect -> 0x28bf), seq 3802326149, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 14], length 0
17:45:20.202604 IP (tos 0x0, ttl 64, id 34584, offset 0, flags [DF], proto TCP (6), length 52)
    31.3.7.140.14216 > 31.3.7.2.ssh: Flags [S], cksum 0x4cba (incorrect -> 0x28bf), seq 3802326149, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 14], length 0
17:45:22.250716 IP (tos 0x0, ttl 64, id 34585, offset 0, flags [DF], proto TCP (6), length 52)
    31.3.7.140.14216 > 31.3.7.2.ssh: Flags [S], cksum 0x4cba (incorrect -> 0x28bf), seq 3802326149, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 14], length 0
17:45:24.426596 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 31.3.7.2 tell 31.3.7.140, length 28
17:45:24.426684 ARP, Ethernet (len 6), IPv4 (len 4), Reply 31.3.7.2 is-at 0c:42:a1:02:67:d9, length 46

As you can see, I thought that on backend server side, it should send response for ssh request. But I can't capture any packets like syn,ack for syn request. Why?

I want to get more ideas from yours who have much experience in Katran Load Balancer. Thanks

swettoth0812 commented 4 days ago

it seems like your ipip interface is not working, so the IPIP packet is not processed in the backend server. Did you follow through on the steps from the guidance here ?https://github.com/facebookincubator/katran/blob/main/EXAMPLE.md#configuration-of-forwarding-plane Could you share the network configuration on your backend server? And could you please capture tcpdump from any interface?

tcpdump -ni any proto 4 -vvv or host 31.3.7.2
Bit-Warrior-X commented 4 days ago

Hi @swettoth0812 Thank you for your kind response.

This is ip inferface in my backend server:

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 31.3.7.2/32 scope global lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN group default qlen 1000
    link/tunnel6 :: brd :: permaddr be92:fafc:2800::
9: ipip0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 127.0.0.42/32 scope host ipip0
       valid_lft forever preferred_lft forever
10: ipip60@NONE: <NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group default qlen 1000
    link/tunnel6 :: brd :: permaddr 6e20:e1a4:6f12::
    inet6 fe80::6c20:e1ff:fea4:6f12/64 scope link 
       valid_lft forever preferred_lft forever
280: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:1f:03:07:96 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 31.3.7.150/24 brd 31.3.7.255 scope global eth0
       valid_lft forever preferred_lft forever

It was my fault

I didn't try this command in backend server for sc in $(sysctl -a | awk '/\.rp_filter/ {print $1}'); do echo $sc ; sudo sysctl ${sc}=0; done

After trying this command in backend server, everything works fine now Thanks

Anyway, I have another 2 questions.

I will wait for your kind response

Best regards.

swettoth0812 commented 1 day ago

Why do we use ipip tunnel even for packet forwarding? Is there any other solution that we don't use ipip tunnel for packet forwarding? Because it seems reduce the performance.

IPIP tunnel allows the reals to be on different subnets. You could go through this blog (https://fedepaol.github.io/blog/2023/09/06/ebpf-journey-by-examples-l4-load-balancing-with-xdp-and-katran/). This explains how Katran works.

And I am not sure why VIP must be configured on backend server.

VIP inside the backend server prevents the kernel from dropping the packet. You can imagine that the INER IP PACKET contains source ip as the client and dest ip as the VIP. If you don't add the VIP to the backend server, it doesn't know that it is the one who need to process the packet.