facebookincubator / katran

A high performance layer 4 load balancer
GNU General Public License v2.0
4.74k stars 503 forks source link

build_grpc error and vip configuration #110

Closed WagleTanvi closed 3 years ago

WagleTanvi commented 3 years ago

Hi, I am trying to install/build katran with Ubuntu 18.04 on a physical server using the latest commit of katran.

Katran gets successfully installed with build_katran.sh. However, when running, ./build_grpc_client.sh. I get this error:

+ get_goclient_deps
+ pushd .
~/katran/example_grpc ~/katran/example_grpc
+ cd goclient/src/katranc/main
+ go get
# katranc/katranc
../katranc/katranc.go:179:2: undefined: ok
../katranc/katranc.go:179:6: undefined: err
../katranc/katranc.go:180:13: undefined: err
../katranc/katranc.go:181:5: undefined: ok
../katranc/katranc.go:335:13: kc.GetVipFlags undefined (type *KatranClient has no field or method GetVipFlags)
../katranc/katranc.go:340:36: cannot use real.Flags (type int32) as type uint32 in argument to parseRealFlags

After some trial and error, when I checkout a previous commit like 92313218fe81aa5cc112a87a7a9493200a66d8ee , the build is successful.

With this build though, I am having issues with getting katran to respond to vip. I set everything up according to the instructions in example.md. As an initial setup, I have two physical servers in my topology each with one active link on same subnet. One server runs katran and second one has apache web server (REAL server).

Configuring Katran with VIP and Real

cd ~/katran
./katran_goclient -A -t 10.200.200.1:80
./katran_goclient -a -t 10.200.200.1:80 -r IP_OF_REAL_WEB_SERVER
# From Katran Server. Curl to the Real Server works. 
curl IP_OF_REAL_WEB_SERVER 

On Katran server (I tried on another server as well), when I try to curl the VIP I set up, it does not work. Nothing outputs. This is as if Katran is not responding to VIP.

curl 10.200.200.1 

Please advise. Thanks!

udippant commented 3 years ago

Thanks for reporting this issue. This issue was introduced in the commit https://github.com/facebookincubator/katran/commit/f6e5cbceda8d4cdd87dcc2f5c4e420f107276172

I have a candidate fix going through review internally (for that missing function GetVipFlags in goclient/src/katranc/katranc/katranc.go and undeclared variable). Should be addressed by tomorrow.

udippant commented 3 years ago

For the seconds issue, what does the output of ./katran_goclient -l show? (This show list of configured services).

Also, do you have decapsulation support on the IP_OF_REAL_WEB_SERVER? If you tcpdump on that host, do you see any incoming packet for the vip? (Also, try tcpdump with additional filter proto 4 to see IPIP encapsulated packet).

WagleTanvi commented 3 years ago

Thank you @udippant.

Here is the output of ./katran_goclient -l ( I masked off part of the IP for security)

username@node-1:~/katran$ ./katran_goclient -l
2021/01/04 19:58:32 vips len 1
VIP:         10.200.200.1 Port:     80 Protocol: tcp
Vip's flags: 
 ->128.105.xxx.xxx  weight: 1
exiting

On the real server, (where apache is running), there is no output from tcpdump.

username@node-2:~$ sudo tcpdump -ni enp1s0f0 proto 4 or host 10.200.200.1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp1s0f0, link-type EN10MB (Ethernet), capture size 262144 bytes

Also, on this node-2 (the REAL), I executed following as mentioned in example.md.

sudo ip link add name ipip0 type ipip external
sudo ip link add name ipip60 type ip6tnl external
sudo ip link set up dev ipip0
sudo ip link set up dev ipip60
sudo ip a a 127.0.0.42/32 dev ipip0

for sc in $(sysctl -a | awk '/\.rp_filter/ {print $1}'); do  echo $sc ; sudo sysctl ${sc}=0; done

sudo ip a a 10.200.200.1/32 dev lo

Finally, just to make sure katran is running properly I ran os_run_tester.sh which showed that all tests passed. Except following message at the end:

…
I0104 20:04:09.550020 33857 BpfTester.cpp:220] Test: QUIC: short header w/ conn id. host id = 0. CH. LRU hit      result: Passed
I0104 20:04:09.550057 33857 BpfTester.cpp:220] Test: UDP: big packet of length 1515. trigger PACKET TOOBIG        result: Passed
I0104 20:04:09.550065 33857 BpfTester.cpp:220] Test: QUIC: short header w/ connection id. CIDv2                   result: Passed
I0104 20:04:09.550073 33857 BpfTester.cpp:220] Test: QUIC: short header w/ connection id but non-existing mapping. CIDv2 result: Passed
I0104 20:04:09.550079 33857 katran_tester.cpp:270] Testing counter's sanity. Printing on errors only
I0104 20:04:09.550211 33857 katran_tester.cpp:338] Testing of counters is complete
E0104 20:04:09.550237 33857 KatranSimulator.cpp:168] src and dst must have same address family
E0104 20:04:09.550243 33857 KatranSimulator.cpp:161] malformed src or dst ip address. src: aaaa dst: bbbb
E0104 20:04:09.550249 33857 BpfLoader.cpp:97] Can't find prog with name: cls-hc
I0104 20:04:09.550256 33857 katran_tester.cpp:192] Healthchecking not enabled. Skipping HC related tests
udippant commented 3 years ago

To see if Katran received the packets, can you see some stats? (such as goclient -s, -sum, -lru etc). xdpdump is another quite useful tool for this.

Also, can you check the mac?

WagleTanvi commented 3 years ago

No output from golient -s -sum -lru (0 packets). Xdpdump was not complied with build_katran.sh, should it be or needs to be built separately?

I am using mac address of the default gateway for machine running katran (double checked again).

On another note, When I start katran, I do see this netlink message, is this normal or katran is having problem receiving traffic?

...
libbpf: elf: skipping relo section(26) .rel.eh_frame for section(25) .eh_frame
libbpf: elf: skipping unrecognized data section(16) .eh_frame
libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
E0107 13:37:49.527056 22842 BpfAdapter.cpp:219] Error receiving netlink message: File exists [17]
Server listening on 0.0.0.0:50051
udippant commented 3 years ago

It looks like katran didn't even receive the packet. Do you see xdp-drop (e.g. with ethtool -S eth0 | grep xdp_drops ) while running the curl cmd? Regarding the xdpdump tool, yeah, the build script doesn't build this tool by default. You'll need to build this target (https://github.com/facebookincubator/katran/blob/master/tools/xdpdump/CMakeLists.txt#L38). [As a quick workaround I was able to build it locally by simply adding add_subdirectory(tools) here . I'll add an integration to build this tool from the build-katran script separately).

That netlink message is likely while adding adding cls-act on the network interface. So shouldn't affect.

WagleTanvi commented 3 years ago

Thanks. I ran ethtool while executing curl 10.200.200.1. However, It seems there is no correlation of xdp drop increase to running curl. The xdp_drop counters just increases slowly even without curl being run.

username@node-0:~/katran$ ethtool -S eno49 | grep xdp_drop
     rx_xdp_drop: 9
     rx0_xdp_drop: 0
     rx1_xdp_drop: 0
     rx2_xdp_drop: 9
     rx3_xdp_drop: 0
     rx4_xdp_drop: 0
     rx5_xdp_drop: 0
     rx6_xdp_drop: 0
     rx7_xdp_drop: 0
     rx8_xdp_drop: 0
     rx9_xdp_drop: 0
     rx10_xdp_drop: 0
     rx11_xdp_drop: 0
     rx12_xdp_drop: 0
     rx13_xdp_drop: 0
     rx14_xdp_drop: 0
     rx15_xdp_drop: 0
     rx16_xdp_drop: 0
     rx17_xdp_drop: 0
     rx18_xdp_drop: 0
     rx19_xdp_drop: 0

ip link output is as below:

@node-0:~/katran$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 98:f2:b3:c4:6b:60 brd ff:ff:ff:ff:ff:ff
    prog/xdp id 11
3: eno50: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 98:f2:b3:c4:6b:61 brd ff:ff:ff:ff:ff:ff
4: ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 9c:dc:71:5d:d5:b0 brd ff:ff:ff:ff:ff:ff
5: ens1f1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 9c:dc:71:5d:d5:b1 brd ff:ff:ff:ff:ff:ff
6: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
7: ipip0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
8: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/tunnel6 :: brd ::
9: ipip60@NONE: <NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/tunnel6 :: brd ::

Full katran command

root     30694 30693  0 16:17 pts/0    00:00:00 ./build/example_grpc/katran_server_grpc -hc_forwarding=false -balancer_prog ./deps/bpfprog/bpf/balancer_kern.o -default_mac 44:31:92:b8:57:40 -healthchecker_prog ./deps/bpfprog/bpf/healthchecking_ipip.o -intf=eno49 -ipip_intf=ipip0 -ipip6_intf=ipip60 -lru_size=10000 -map_path /sys/fs/bpf/jmp_eno49 -prog_pos=2

Mac address of the default router:

@node-0:~/katran$ ip n show
128.110.nn.nn dev eno49 lladdr 44:31:92:b8:57:40 REACHABLE

I will try to get xdpdump running shortly but wanted to give you above info to see if you can spot any obvious issues.

WagleTanvi commented 3 years ago

I got xdpdump working. There was a linking issue, I had to remove https://github.com/facebookincubator/katran/blob/master/tools/xdpdump/CMakeLists.txt#L48. (Not sure if it was needed)

It seems katran is not advertising the VIP (10.200.200.1) so doing curl to the VIP on katran or from other servers (nn1/nn2) doesn't produce any traffic.

I tried to curl to base IP (128.110.nn.nn) where katran is running and I see the traffic in xdpdump. But curl to 10.200.200.1 doesn't produce any result in xdpdump.

@node-0:~$ sudo ./katran/_build/build/tools/xdpdump/xdpdump -map_path /sys/fs/bpf/jmp_eno49 -dport 80
src: 128.110.nn1.nn1 dst: 128.110.nn.nn
proto: 6 sport: 53026 dport: 80 pkt size: 74 chunk size: 74
src: 128.110.nn2.nn2 dst: 128.110.nn.nn
proto: 6 sport: 45876 dport: 80 pkt size: 74 chunk size: 74

Is there any way to check if katran is advertising the VIP?

udippant commented 3 years ago

Katran itself does not advertise the VIP. That part is not open-sourced, which also depends a lot on the environment it is running. In a typical setup:

WagleTanvi commented 3 years ago

Ok sure. For now, I just defined a static route from my client to the katran VIP and everything seems to be working. Thanks.

bienkma commented 2 years ago

./katran/_build/build/tools/xdpdump/xdpdump

Hi @WagleTanvi how to build xdpdump? Could you please explain here? I need the tool to debug in my server. Thank you!