IPVS not masquerading despite lb_kind nat

baccenfutter commented 1 year ago

bug description

(tbh, I am not even sure if this is not rather a Linux/IPVS bug)

The documentation in several sources (here is one) clearly states that the NAT forwarding technique is a combination of SNAT and DNAT, where packets are forwarded to the backends via address translation in a way that the return packets will find their way back through the keepalive host (masquerading, a.k.a. SNAT).

The ipvsadm tool also clearly lists the backends as forward type ("Masq").

# ipvsadm -l -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  <public-ip>:80 wlc persistent 50
  -> <private-ip-of-ct>:80               Masq    1      0          0

However, tcpdump on my backends unmistakably shows that the packets are NOT masqueraded and in in fact are originating from the original src-ip. Since my backends are LXC containers on remote hosts, the return packets take the default route through the lxdbr0 and are black-holed.

14:25:34.862891 IP <laptop-ip>.23474 > <private-ct-ip>.80: Flags [SEW], seq 4140399563, win 42340, options [mss 1360,nop,nop,sackOK,nop,wscale 9], length 0
14:25:35.904553 IP <laptop-ip>.23474 > <private-ct-ip>.80: Flags [S], seq 4140399563, win 42340, options [mss 1360,nop,nop,sackOK,nop,wscale 9], length 0

To Reproduce

install two Ubuntu latest VMs, with two interfaces each (eth0 public, eth1 private)
assign publicly reachable IP to eth0 on both VMs
assign some private IP to eth1 (10.11.0.0/24)
install and initialize LXD on the VM2
launch one Debian latest CT with two interfaces (one default bridged to lxdbr0, one bridged to eth1)
install keepalived on VM1
configure a virtual_server on port 80 with the real_server pointing to the CT on VM2 (see config below)
install Apache2/Nginx inside the CT
start tcpdump -n -i eth1 port 80 in the CT
run curl http://<public-ip>/ on your laptop
realize that tcpdump shows the requests originating from your laptop, NOT from the NAT gateway
also realize that the return packets will now take the default route via lxdbr0 and thus blackhole

Expected behavior The traffic from the virtual_server to the real_server should be masqueraded and hence all return packets should route back through the keepalived host and not via some random default route.

Keepalived version

Keepalived v2.0.19 (10/19,2019)

Copyright(C) 2001-2019 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 5.4.166
Running on Linux 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023

configure options: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-kernel-dir=debian/ --enable-snmp --enable-sha1 --enable-snmp-rfcv2 --enable-snmp-rfcv3 --enable-dbus --enable-json --enable-bfd --enable-regex build_alias=x86_64-linux-gnu CFLAGS=-g -O2 -fdebug-prefix-map=/build/keepalived-QKJBdn/keepalived-2.0.19=. -fstack-protector-strong -Wformat -Werror=format-security LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2

Config options:  NFTABLES LVS REGEX VRRP VRRP_AUTH JSON BFD OLD_CHKSUM_COMPAT FIB_ROUTING SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 DBUS

System options:  PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV4_DEVCONF IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_VMAC VRRP_IPVLAN IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RT SCHED_RESET_ON_FORK

Distro

VMs:
- Name: Ubuntu 20.04
- Architecture: x86_64
CTs:
- Name: Debian 11
- Architecture: x86_64

Details of any containerisation or hosted service

standard Ubuntu LXD package (snap).
standard upstream images:debian/11 container image

Configuration file:

global_defs {
   notification_email {
      ...
   }
   smtp_server 127.0.0.1
   router_id <unique-name>
}

virtual_server <pub-ip> 80 {
  delay_loop 10
  lb_algo wlc
  lb_kind NAT
  persistence_timeout 50
  protocol TCP
  alpha

  real_server <priv-ip-of-remote-ct> 80 {
    weight 1
    TCP_CHECK {
      connect_timeout 3
      connect_port 80
    }
  }
}

System Log entries

Feb 08 14:22:07 lxc00 Keepalived_healthcheckers[384355]: Initializing ipvs
Feb 08 14:22:07 lxc00 Keepalived_vrrp[384356]: Opening file '/etc/keepalived/keepalived.conf'.
Feb 08 14:22:07 lxc00 Keepalived_healthcheckers[384355]: Activating healthchecker for service [<private-ct-ip>]:tcp:80 for VS [<public-ip>]:tcp:80
Feb 08 14:22:07 lxc00 Keepalived_healthcheckers[384355]: Activating BFD healthchecker
Feb 08 14:22:16 lxc00 Keepalived_healthcheckers[384355]: TCP connection to [<private-ct-ip>]:tcp:80 success.
Feb 08 14:22:16 lxc00 Keepalived_healthcheckers[384355]: Adding service [<private-ct-ip>]:tcp:80 to VS [<public-ip>]:tcp:80
Feb 08 14:22:16 lxc00 Keepalived_healthcheckers[384355]: Gained quorum 1+0=1 <= 1 for VS [<public-ip>]:tcp:80

Did keepalived coredump?

no

Am I completely missing something? I couldn't find any hints in man keepalived.conf, nor on big G.

pqarmitage commented 1 year ago

The limit of what keepalived does in respect of IPVS is it adds and removes IPVS configuration just as ipvsadm does. What IPVS does, how it works, and how to configure it is outside the scope of keepalived. Generally if someone is having difficulty getting IPVS to work, I advise configuring it with ipvsadm without using keepalived, getting that working, and then updating the keepalived configuration so that it sets up the same IPVS configuration as has been got working with ipvsadm. However, I can provide some pointers for you to explore.

You state:

The documentation in several sources (here is one) clearly states that the NAT forwarding technique is a combination of SNAT and DNAT, where packets are forwarded to the backends via address translation in a way that the return packets will find their way back through the keepalive host (masquerading, a.k.a. SNAT).

I cannot see any statement there that IPVS masquerading does SNAT, nor even any suggestion that it might. My understanding is that masquerading does NOT do SNAT, and that therefore the default route from the real servers has to be via the keepalived host (or more specifically via the host on which the IPVS configuration is set up) [there are ways around using the default route which I will explain below]. The best place for descriptions of how IPVS works (albeit extremely old) is on the Linux Virtual Server web site; here the examples clearly show that there is no SNAT performed. The best starting point is IPVS documentation.

There are ways of applying SNAT to the packets, but it is a bit convoluted - see below.

How to avoid the default route on real servers

If the IP addresses of the real servers are only used for those services, in other words all packets from the real server hosts with the source address being the address configured in the keepalived real server entries), then it is possible to use source based routing to return packets sent from the real server addresses via the keepalived server(s). It would require a keepalived virtual router to be configured on the private side of the keepalived host, so that the real servicice packets are returned via the VIP.

Suppose a real service is on address 10.0.0.1 TCP port 80, and the private address of your keepalived host is 192.168.0.1. You could specify the following:

ip rule add from 10.0.0.1 ipproto TCP sport 80 table 30000
ip route add default via 192.168.0.1 table 30000

How to apply SNAT

In order to apply SNAT as well, it is necessary to forward the packets to the real server via a separate network namespace. The SNAT can then either be done in the added network namespace or in the default network namespace. I can't remember if packets returned need to be routed via the added network namespace or not (they certainly do if the SNAT is done in the added network namespace). The reason for forwarding incoming packets via the added network namespace is so that when they are forwarded from the added network namespace back to the default network namespace, the packets pass through the protocol stack is if they are new packets, and pass though the relevant tables if nftables/iptables to be able to apply the SNAT. If the packets do not go via the added network namespace, then IPVS makes the packets skip the NAT tables in nftables/iptables.

If you want to use the above approach for applying SNAT, then please update this issue, and I will look at providing some more detailed configuration examples (there was an issue reported a few years ago where I provided the details of how to do this).

baccenfutter commented 1 year ago

It is in fact called masquerading:

# ipvsadm -l -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  <virtual_server>:80 wlc persistent 50
  -> <real_server>:80               Masq    1      0          0         
  -> <real_server>:80               Masq    1      0          0

The stated documentation also clearly speaks about SNAT if you read the first paragraph all the way through:

The replies are also translated in the reverse direction, when the real servers reply to the users’ requests.

It shouldn't be called masquerading, if it is in fact not masquerading. Masquerading is a well-defined term and it means SNAT. See:

It should instead be called: Forwarding or Port-Forwarding, or at least simply NAT. But Masquerading explicitly refers to SNAT.

pqarmitage commented 1 year ago

I interpret

The replies are also translated in the reverse direction, when the real servers reply to the users’ requests.

to mean that the source address of the reply is translated (i.e. the reverse direction action for DNAT).

I agree calling it NAT would be more descriptive of what happens, but it has been called masquerading in IPVS for nearly 20 years now.

acassen / keepalived

IPVS not masquerading despite lb_kind nat #2243

How to avoid the default route on real servers

How to apply SNAT