antrea-io / antrea

Kubernetes networking based on Open vSwitch
https://antrea.io
Apache License 2.0
1.65k stars 362 forks source link

Egress use external IP on eno2 #6547

Open yeshl opened 1 month ago

yeshl commented 1 month ago

node host subnet 192.168.3.0/24 on NIC eno1, external subnet 112.1.6.0/24 on NIC eno2 how to specifying IP 112.1.6.96/24 (SNAT) the traffic from the selected Pods to the external network should use.

apiVersion: crd.antrea.io/v1beta1
kind: ExternalIPPool
metadata:
  name: ip-pool-external
spec:
  ipRanges:
    - start: 112.1.6.96
      end: 112.1.6.111
  subnetInfo:
   gateway: 112.1.6.254
   prefixLength: 24
#  interfaces:  #no this field
#    - eno2
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: node51
---
apiVersion: crd.antrea.io/v1beta1
kind: Egress
metadata:
  name: egress-external
spec:
  appliedTo:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: dev
    podSelector:
      matchLabels:
        app: busybox
  egressIP: 112.1.6.99
  externalIPPool: ip-pool-external
antoninbas commented 1 month ago

~We do not control the outgoing interface for you~. It will depend on the routing configuration on your Nodes. If the default route goes through eno2, and the default route is selected for the Egress traffic, then the Egress traffic will go through eno2.

Note that in Antrea we have the concept of transportInterface. It is the interface used for inter-Node traffic (e.g., traffic from Pod A on Node 1 that needs to go to Pod B on Node 2). By default, the transportInterface is determined using the NodeIP reported by kubelet. In your case, it seems that the transportInterface is eno1. When the transportInterface (eno1) is different from the interface used for Egress (eno2), you should ensure that arp_ignore == 0 (https://sysctl-explorer.net/net/ipv4/arp_ignore/) on your Nodes in order for ARP advertisement to work properly for the Egress IPs.

cc @tnqn

Edit: I think my comment is mostly inaccurate :( so I will hide it to avoid confusion

yeshl commented 1 month ago

i want use High-Availability Egress with IPs in subnet 112.1.6.0/24 not in the same subnet as the Node IPs, as above configuration,EgressIP: 112.1.6.99 on node51,but pods unable to redirected traffic to the node51. what's wrong with me?

rs20:~# antctl get featuregates|grep Enabled
AntreaPolicy                    Enabled      BETA
AntreaProxy                     Enabled      GA
Egress                          Enabled      BETA
EgressSeparateSubnet            Enabled      ALPHA
EndpointSlice                   Enabled      GA
Multicast                       Enabled      BETA
NetworkPolicyStats              Enabled      BETA
NodeNetworkPolicy               Enabled      ALPHA
NodePortLocal                   Enabled      GA
ServiceExternalIP               Enabled      ALPHA
TopologyAwareHints              Enabled      BETA
Traceflow                       Enabled      BETA
tnqn commented 1 month ago

@yeshl Does the node that runs the pod have eno2 and an IP from the 112.1.6.96/24 subnet? Could you collect a support bundle via antctl supportbundle from at least these two nodes and share with us to understand how the node network is configured?

yeshl commented 1 month ago

The node50/node51 have tow NIC: eno1(10.0.3.0/24) and eno2(no ip), but other nodes have only one NIC eno1(10.0.3.0/24) . i use metallb High-Availability IP(112.1.6.11/24) on node50/node51,it work very well!! but metallb can only do dnat, i think antrea Egress can do snat,so i use Egress in the same subnet as metallb (112.1.6.11/24),but it can't work! .....

tnqn commented 1 month ago

Is node50 the one running the test Pod? Can other nodes reach eno2's IPs via direct route or default gateway? This is a requirement to make Egress work as mentioned in https://github.com/antrea-io/antrea/blob/main/docs/egress.md#egressip:

The egressIP field specifies the egress (SNAT) IP the traffic from the selected Pods to the external network should use. The IP must be reachable from all Nodes.

antoninbas commented 1 month ago

@yeshl is this a good representation of your network topology:

egress

I think you could get it to work if you manually add the routes I wrote in italics in the diagram. Note that you would need to change your Egress definition to remove the subnetInfo entry altogether.

Edit: I added the onlink keyword by mistake in the route for Node 1. There is no gateway and it is not needed.


@tnqn I feel like we have been getting quite a few queries like this, with similar use cases:

I don't know if the current implementation can really handle these use cases very well. There seems to be some general confusion about how subnetInfo can be used - including from me maybe :P

yeshl commented 1 month ago

@antoninbas thank u!

#work Node:node31/node32
node31(work pod running):~# ip r add 112.1.6.0/24 dev eno1
node32(work pod running):~# ip r add 112.1.6.0/24 dev eno1

#Egress Node:node50
node50:~# ip r add 112.1.6.0/24 dev eno2 table 100
node50:~# ip r add default via 112.1.6.254 table 100
node50:~# ip ru add from 112.1.6.0/24 lookup 100

For matelLB,As long as the above route is added, it can work very well!!!

antrea egress can work after I add the following route and ip.(but I hope the following route and IP shouldn't be added !,It disrupted high availability for egress)

node50:~# ip ru add from 10.244.8.25 lookup 100  #10.244.8.25 is work pod's IP
node50:~# ip a add 112.1.6.96/24 dev eno2

refer to the implementation of matelLB, it is done very well!

Additionally,The CRD ExternalIPPool appears to use an external IP address such as 112.1.6.99(not 192.168./172./10.*), but in reality it requires the IP in nodes subnet, which is really puzzling! Is it better to use InternalIPPool or LocalIPPool?

antoninbas commented 1 month ago

Additionally,The CRD ExternalIPPool appears to use an external IP address such as 112.1.6.99(not 192.168./172./10.*), but in reality it requires the IP in nodes subnet, which is really puzzling! Is it better to use InternalIPPool or LocalIPPool?

The Egress IP needs to be routable from any Node, which is not the same as "it requires the IP in nodes subnet"

In the example diagram above, I am adding a route for the Egress "subnet" on every Node manually, which is why the "routable from any Node" requirement is met and why it works. The EgressSeparateSubnet feature is meant as a mechanism to easily use a different subnet from the Node subnet, without having to program routes manually, but my personal take so far is that it cannot really work with your use case (different interfaces on Egress Node + only the Egress Node is connected to the upstream router).

tnqn commented 1 month ago

@tnqn I feel like we have been getting quite a few queries like this, with similar use cases:

  • use a different network / interface for Egress
  • use a dedicated Node for Egress that's connected to the external router (via a separate interface)

I don't know if the current implementation can really handle these use cases very well. There seems to be some general confusion about how subnetInfo can be used - including from me maybe :P

@antoninbas Sorry for the late reply. I agree it should be improved or clarified. I will take a look at this and think about it tomorrow or next week.