antrea-io / antrea

Kubernetes networking based on Open vSwitch
https://antrea.io
Apache License 2.0
1.65k stars 362 forks source link

Make proxyAll/LoadBalancerModeDSR work with kube-proxy present #6232

Closed tnqn closed 1 month ago

tnqn commented 4 months ago

Describe what you are trying to solve

For users who want to use LoadBalancerModeDSR, currently we require kube-proxy to be removed as it will handle the Service traffic in host network before it gets forwarded to OVS, which is also the prerequisite of using proxyAll.

However, removing kube-proxy may be not a easy task for users of managed Kubernetes clusters which don't provide the option. Hence, it would be hard for them to take advantage of this feature on these platforms.

Previously we don't have a good reason why users want to use proxyAll when kube-proxy is present, but with the DSR support, now I wonder whether we should make proxyAll really proxy all Service traffic regardless of the presence of kube-proxy. It's actually not hard to achieve as it's just about the priority of iptables rules, and it can really benefit users who want to use DSR mode.

Describe the solution you have in mind

When proxyAll is enabled,

  1. Setup iptables/nftables nat rules to ensure kube-proxy's DNAT rules are bypassed.
  2. Setup iptables/nftables filter rules to allow asymmetric traffic of Services using DSR mode.

Describe how your solution impacts user flows

Users who expect Antrea to process all Service traffic and/or want to use DSR mode can use the features by setting proxyAll to true, without having to remove kube-proxy.

tnqn commented 4 months ago

cc @jianjuns @antoninbas @hongliangl

hongliangl commented 4 months ago

Do we need to bypass all Service traffic or only LoadBalancers?

If bypassing all Service traffic (I assumed that users need to set proxyAll to true and don't need to set kubeAPIServerOverride), the first Service kubernetes, which is required by K8s client, should be accessible with kube-proxy until the flows for Service kubernetes are installed in AntreaProxy, then an extra rule will be added in prerouting/input chain of nat table to bypass kube-proxy rules. Two more things we need to consider:

If bypassing LoadBalancer traffic only, we could add a chain to match LoadBalancer traffic and bypass kube-proxy in prerouting/input chain of nat table. We could leverage some existing code to build the iptables rules and sync them periodically.

I'm looking forward to hearing your thoughts.

wenyingd commented 4 months ago

Just want to confirm this requirement is only suitable for Linux, right? I am asking it because we can't make kube-proxy with kernel mode to work with Antrea on Windows, since it may cause issues in the HNS Network (VMSwitch) because the VMSwitch Extensions are not consistent.

tnqn commented 4 months ago

Do we need to bypass all Service traffic or only LoadBalancers?

I'm thinking the external access points only, i.e. LBIP, externalIP, NodePort. It's basically the same as before but ensures the effective proxy is antrea, regardless of kube-proxy's presence.

If bypassing LoadBalancer traffic only, we could add a chain to match LoadBalancer traffic and bypass kube-proxy in prerouting/input chain of nat table. We could leverage some existing code to build the iptables rules and sync them periodically.

Yes, I was thinking ipsets contsisting LBIPs and externalIPs, like NodePort ipset. We could perhaps even remove the per-ip route (but perhaps not good to make the change in the same PR).

Just want to confirm this requirement is only suitable for Linux, right? I am asking it because we can't make kube-proxy with kernel mode to work with Antrea on Windows, since it may cause issues in the HNS Network (VMSwitch) because the VMSwitch Extensions are not consistent.

Yes, it's for Linux only. The proposal is just making a common scenario work better, not introducing a scenario not required by users.

hongliangl commented 4 months ago

Yes, I was thinking ipsets contsisting LBIPs and externalIPs, like NodePort ipset. We could perhaps even remove the per-ip route (but perhaps not good to make the change in the same PR).

It seems that we cannot remove the per-ip route. If we remove these routes we need to do DNAT to the traffic destined for the LBIPs and externalIPs(like NodePort). Another issue introduce in this case is that we cannot identify these traffic in ServiceLB table since they all have the same destination IP.

tnqn commented 4 months ago

It seems that we cannot remove the per-ip route. If we remove these routes we need to do DNAT to the traffic destined for the LBIPs and externalIPs(like NodePort). Another issue introduce in this case is that we cannot identify these traffic in ServiceLB table since they all have the same destination IP.

I should have described more details: after matching the IPs in iptables, the action should mark them, then we only need a single route to forward them to OVS, thus we don't need per-ip route.

hongliangl commented 4 months ago

It seems that we cannot remove the per-ip route. If we remove these routes we need to do DNAT to the traffic destined for the LBIPs and externalIPs(like NodePort). Another issue introduce in this case is that we cannot identify these traffic in ServiceLB table since they all have the same destination IP.

I should have described more details: after matching the IPs in iptables, the action should mark them, then we only need a single route to forward them to OVS, thus we don't need per-ip route.

Do you meant that we can use a policy-route to forward the marked traffic to the OVS pipeline?

hongliangl commented 4 months ago

How about this?

This is the chain I'm testing right now but not finished yet:

-A ANTREA-PREROUTING -m comment --comment "Antrea: bypass Service kubernetes" -d 10.96.0.1 -p tcp --dport 443 -j KUBE-SERVICES
-A ANTREA-PREROUTING -m comment --comment "Antrea: DNAT external to NodePort packets" -m set --match-set ANTREA-NODEPORT-IP dst,dst -j DNAT --to-destination 169.254.0.252
-A ANTREA-PREROUTING -m comment --comment "Antrea: accept Service traffic sourced from external network" -m set --match-set ANTREA-EXT-SERVICE-IP-PORT dst,dst -j ACCEPT