antrea-io / antrea

Kubernetes networking based on Open vSwitch
https://antrea.io
Apache License 2.0
1.67k stars 371 forks source link

EgressIP support multi-nic environment #2685

Closed lionstack closed 2 years ago

lionstack commented 3 years ago

Describe the problem/challenge you have In my environment, there are multi-nic in one node, for example: internal network: 192.168.12.0/24 public network: 172.16.10.0/24 And the default route is public nic. Besides, the internal network cannot communicate with public network.

When I create an egressIP(192.168.12.155), and the egressIP is allocated on node2, there will be an iptables rule created on node2: A ANTREA-POSTROUTING ! -o antrea-gw0 -m comment --comment "Antrea: SNAT Pod to external packets" -m mark --mark 0x1/0xff -j SNAT --to-source 192.168.12.155 When the pods which matched the egressIP access internet, the packets will be sent out from node2's gw0 port, then the iptables rule will do the SNAT action that the sourceIP will be translate to 192.168.12.155, and sent out from public nic. But the public nic only support Internet access with 172.16.10.0/24, so the packets will be dropped that the pods cannot access internet :-(

Describe the solution you'd like Perhaps there will be several ways to solve this problem:

  1. Add a parameter named "forceUseMasq" to egressIP, when it's set to be true, the related iptables rule will changed to: A ANTREA-POSTROUTING ! -o antrea-gw0 -m comment --comment "Antrea: SNAT Pod to external packets" -m mark --mark 0x1/0xff -j MASQUERADE Then when the pod which matched egressIP access internet, the packets's sourceIP will be translated to public nic's IP that the pods can access internet normally.

  2. Modify the implementation of egressIP, doesn't create tunnel between nodeIP and egressIP any more, use the nodes' tunnels directly, then the egressIP can be assigned to publicIP, such as 172.16.10.155. And add some openflows force the packets send to the nodes which the egressIP belong to. When the pods which matched the egressIP access internet, the packets will be sent out from node2's gw0 port, then the iptables rule will do the SNAT action that the sourceIP will be translate to 172.16.10.155 that the pods can access internet normally.

  3. Add a new object egressNode like egressIP, when the egressNode is created, there will be an iptables rule created: A ANTREA-POSTROUTING ! -o antrea-gw0 -m comment --comment "Antrea: SNAT Pod to external packets" -m mark --mark 0x1/0xff -j MASQUERADE When the pod matched egressNode access internet, the packets will be sent to the node, then use the iptables rule created by egressNode to access internet normally.

antoninbas commented 3 years ago

Adding @tnqn and @wenqiq for comments

wenqiq commented 3 years ago

@lionstack Thanks for your detailed solutions. Add a parameter "forceUseMasq" to egressIP is a good idea and easy to implement, from my point of view. However, I think it seems that internal IP addresses are rarely used in EgressIP, maybe I am wrong.

Jexf commented 3 years ago

How about use virtual mac to mark and identify the packets which need to do sant. Each egress rule allocate a virtual mac, we update the src mac to virtual mac, then use node transport ip instead of snat ip to transport the packets to the egress nodes, and just match the src mac with virtual mac in snatIPFromTunnelFlow function.

tnqn commented 3 years ago

The first approach may fix this particular case but doesn't make a lot of sense from the API's perspective. The egressIP makes no more sense when forceUseMasqis set and becomes a tunnel endpoint IP. And I think not many people can understand the configuration.

We considered the second approach in the begining but chose the current way to avoid encapsulating the egress IP in the tunnel metadata or calculating egress IP based on source IP of the packet on Egress Node side. @jianjuns Do you think it makes sense to remove the assumption that all Egress IPs are reachable from all Nodes now?

I actually proposed the 3rd approach but then realized it's still specific to this case that user don't need particular egress IP, and may make the API more complicated, so maybe not worthy. But I didn't mean creating a new object egressNode, but allow Egress object to specify egressNode and leave EgressIP empty, which means the traffic will be redirected to that Node and do MASQUERADE with whatever IP that Node has.

leonstack commented 3 years ago

@lionstack Thanks for your detailed solutions. Add a parameter "forceUseMasq" to egressIP is a good idea and easy to implement, from my point of view. However, I think it seems that internal IP addresses are rarely used in EgressIP, maybe I am wrong.

Yes, when forceUseMasq is true, the egressIP won't be used to SNAT, but it can support high availability for egress access.

jianjuns commented 3 years ago

I also agree approach 2 is the right way to address such use cases, but it is a big change, and in my mind we might still like to keep the current implementation as it is most efficient and can support a very common deployment topology. So, also thinking could we go approach 1 for now, and see if we really have requirements for generic support of "configurable egress IPs on dedicated egress nodes". @tnqn : what you think?

tnqn commented 3 years ago

I also agree approach 2 is the right way to address such use cases, but it is a big change, and in my mind we might still like to keep the current implementation as it is most efficient and can support a very common deployment topology. So, also thinking could we go approach 1 for now, and see if we really have requirements for generic support of "configurable egress IPs on dedicated egress nodes". @tnqn : what you think?

By going approach 1, the egress IP will only be the tunnel IP, instead of its literal meaning, I'm concerned if the API is still understandable, and user will wonder what's the usage of egressIP here. A more friendly way in my mind is to set a special value to indicate the "masquerade" behavior, maybe just "Masquerade", like headless service using "None" as the clusterIP. For implementation,

  1. We don't allocate an IP from the externalIPPool when EgressIP is set to "Masquerade", but still requires an externalIPPool to be provided (can just be an IPPool with empty IP ranges) to select Nodes.
  2. When selecting egress node, we calculate consistentHash value based on the egress name instead of the "egressIP", assuming distributing egress traffic to multiple Nodes is desired.
  3. When encapsulating the packet on source side, we use the node IP as the tunnel dest IP, and don't set pkt_mark on egress node side so it will be be masqueraded by default rule.
jianjuns commented 3 years ago

How if user wants to specify a single Node for Egress? Then a egress pool with a single Node should be created?

I got what your points, but it will be a bigger change then, and it is like approach 3. I was thinking we just support a single Node, and then user can manually specify the Node IP to be the egress IP (even though the IP is not really the SNAT IP and so a little confusing as you said).

jianjuns commented 3 years ago

@leonstack : several questions for your use cases.

  1. Would you let Antrea manage egress IP assignment to egress Nodes, or you will manually manage egress IP to Node assignment? E.g. if we go your approach 2, will you define an ExternalIPPool associated with the egress Nodes, and let Antrea assign IPs from the pool to the egress Nodes?
  2. Do you have requirements for HA? E.g. if we go your approach 1, then I assume user needs to specify the egress Node's tunnel IP to be the egress IP in the Egress CRD, but then we wont support failover of Node, as the tunnel IP cannot move.
leonstack commented 3 years ago

@leonstack : several questions for your use cases.

  1. Would you let Antrea manage egress IP assignment to egress Nodes, or you will manually manage egress IP to Node assignment? E.g. if we go your approach 2, will you define an ExternalIPPool associated with the egress Nodes, and let Antrea assign IPs from the pool to the egress Nodes?

For approach 2, MASQUERADE doesn't need an egressIP, but like @tnqn said, I think we must find a way to select nodes for execute MASQUERADE action.

  1. Do you have requirements for HA? E.g. if we go your approach 1, then I assume user needs to specify the egress Node's tunnel IP to be the egress IP in the Egress CRD, but then we wont support failover of Node, as the tunnel IP cannot move.

For approach 1, I think the egressIP is still associated from an ExternalIPPool, and worked as a tunnel IP, but won't be used for SNAT any more, just for HA, because it can move between the nodes which ExternalIPPool selected. Or egressIP is managed manually and we make sure the egressIP can move between the nodes we selected(for example keepalived)

jianjuns commented 3 years ago

Ok. I see you mean you do not really want to define particular egress IPs, but want to egress from a set of egress Nodes, and support HA.

I agree to support that, we should have either egress Nodes concept, or the empty pool as @tnqn suggested. The questions are: 1) whether we should go a more generic way to support allocating egress IPs for egress Nodes too; 2) or we just go your approach 1, which is not very convenient but might work for you (and it is quite simple).

leonstack commented 3 years ago

@jianjuns There is one problem for approach 1, because we haven't found a way to separate egress, if several Egresses use same egressIP, and one of them use MASQUERADE, the others' action will also changed to MASQUERADE, although they are not marked MASQUERADE.

Jexf commented 3 years ago

I also agree approach 2 is the right way to address such use cases, but it is a big change, and in my mind we might still like to keep the current implementation as it is most efficient and can support a very common deployment topology. So, also thinking could we go approach 1 for now, and see if we really have requirements for generic support of "configurable egress IPs on dedicated egress nodes". @tnqn : what you think?

By going approach 1, the egress IP will only be the tunnel IP, instead of its literal meaning, I'm concerned if the API is still understandable, and user will wonder what's the usage of egressIP here. A more friendly way in my mind is to set a special value to indicate the "masquerade" behavior, maybe just "Masquerade", like headless service using "None" as the clusterIP. For implementation,

  1. We don't allocate an IP from the externalIPPool when EgressIP is set to "Masquerade", but still requires an externalIPPool to be provided (can just be an IPPool with empty IP ranges) to select Nodes.
  2. When selecting egress node, we calculate consistentHash value based on the egress name instead of the "egressIP", assuming distributing egress traffic to multiple Nodes is desired.
  3. When encapsulating the packet on source side, we use the node IP as the tunnel dest IP, and don't set pkt_mark on egress node side so it will be be masqueraded by default rule.

@tnqn If we don't allocate an IP from the externalIPPool when EgressIP is set to "Masquerade", it means egress feature can‘t set egress ip for multi interfaces env and only support use default "Masquerade" on egress node, which transport ip and public ip are isolated

Jexf commented 3 years ago

@tnqn @wenqiq How about allocate a global snat mark in egressgroup, and use the snat mark to generate a virtual unique mac in antrea-agent sync egress function, then use the virtual mac as source mac. More reasons:

1.Now each antrea-agent need to maintain snat marks info standalone. When antrea-agent restart, it needs to reallocate the snat mark, and may conflict with old mark, then may cause a brief conflict of egress function(maybe not).

2.Use the global snat mark also can generate a virtual unique mac to mark and identify the egress snat packets.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days