Closed thardie closed 1 year ago
@thardie thanks for reporting the issue.
Dealing with martian packets has been the single most challenge in DSR functionality in kube-router. There are policy-based routing rules that kube-router adds to avoid martian packets. Likely they are missing or kube-router failed to configure them by in your setup.
If you still happen to have the setup or able to reproduce this scenario would mind sharing below details.
ip rule list
ip route list table 77
ip route list table 78
In your case to avoid [81635.233492] IPv4: martian source 10.104.5.122 from 10.116.4.1, on dev kube-bridge
I would expect a route in table 78 created by kube-router to cheat kernel to believe 10.104.5.122
is reachable on `kube-bridge.
I added the following 2 lines to each worker and master's /etc/sysctl.conf:
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.rp_filter=0
and rebooted them all. Have been unable to reproduce the martians since then. I'm reverting that change and see if I can reproduce the martian issue now.
@murali-reddy I just re-read your comment - The address 10.104.5.122 is the outside client IP (Where the SYN came from, and where the SYN-ACK is going back to). My k8s address are all in 10.116.0.0/16, so I shouldn't expect to see client (outside) addresses in table 78, would I?
I'll continue to try and reproduce and get the ip rules and table output once reproduced again.
@thardie sorry it should be 10.116.4.1
in routing tables 77 and 78
I've been able to reproduce this issue. I checked table 78 and 77. Table 77 is empty, and 78 has:
local default dev lo scope host
I tried adding a route to table 77 (looks like the rule to handle reply traffic coming out from the containers), but doesn't seem to help:
10.116.4.1 dev kube-bridge scope link
Adding it to table 78 seems wrong, since that's traffic coming in, and would mess up the IP-in-IP encapsulation, right? In fact, I start to see ARPs for 10.116.4.1 on kube-bridge if I add the same route to able 78.
@thardie sorry i might have passed wrong table numbers earlier. You should see below tables (name, id)
customDSRRouteTableID = "78"
customDSRRouteTableName = "kube-router-dsr"
externalIPRouteTableId = "79"
externalIPRouteTableName = "external_ip"
Following combination of iptable mangle rules, policy based routing achieve the DSR.
For the incoming traffic towards external IP used for service marked with DSR following are rules apply:
iptables -t mangle -A -d externalIP -m protocol, -p protocol --dport port -j MARK --set-mark generated-fwmark
ip rule add prio 32764 fwmark generated-fwmark table customDSRRouteTableID
ip route add local default dev lo table customDSRRouteTableID
on the return path of packet from the pods below rules are applicable. Second rule in particular avoids the martian packets.
ip rule add prio 32765 from all lookup externalIPRouteTableId
ip route add externalIP dev kube-bridge table externalIPRouteTableId
Please match with this description in your setup and see if there is anything missing.
Hi, @thardie did you use Loadbalancer to public service? I have the same issue when I use DSR mode with metallb in layer2.
Loadbalancer not supported in code, I add it and testing ok. https://github.com/cloudnativelabs/kube-router/blob/4afd6d6d2ab9c94abc5985c30c56ca2605a70a3f/pkg/controllers/proxy/network_services_controller.go#L2198
Can we support Loadbalancer? Is there any risk?@murali-reddy
Closing as stale
JFYI... we had similar problems with a manual setup with ipvs in DSR mode and just could not get it to work without the kernel throwing away our packages as "martian source". So we just added a simple XDP/bpf program (on in our case a wireguard interface) prepended an ethernet header ourself and used xdp_redirect to eth0
to get the damn packages flowing out to the network... worked without issues.
I have a test cluster setup. 1 master and 3 workers. Kube-router is running on all 4 nodes. I'm running 1 external IP for nginx (3 instances) with BGP amongst all kube-routers and BGP up to an upstream router. So, packet flow inbound is:
router->1 of the 4 nodes IPVS->IPIP tunnel to 1 of the 3 nginx instances->nginx
Inbound always works fine.
Outbound: nginx instance->host->router
Sometimes, and I don't know what causes this to engage, the host starts to drop the replies. I enable martian logging, and it's hitting the martian case. I tried to disable rp_filter for all interfaces on the host (including all and default) and there are still martians.
IPVS table:
mangle table:
tcpdump showing issue:
dmesg showing martians: