antrea-io / antrea

Kubernetes networking based on Open vSwitch
https://antrea.io
Apache License 2.0
1.67k stars 370 forks source link

I want to set a SDN controller for br-int, but I can't find br-int #235

Closed magaboomchen closed 3 years ago

magaboomchen commented 4 years ago

Describe what you are trying to do A description of what you are trying to achieve, what you have tried so far and the issues you are facing. Hi, I want to use an OvS-based network plugin for my k8s cluster so that I can use an SDN controller to control all OvS and realize Service Function Chain forwarding. I have deployed a k8s cluster(v1.16.1 with 1 master node and 1 work node), and applied Antrea (v0.1.1) on the master node. And both node is in Ready status.

My purpose: I want to find the OvS on the work node and set my SDN controller ( i.e. RYU running on the master node) for it. After reading the architecture.md (https://github.com/vmware-tanzu/antrea/blob/master/docs/architecture.md), I find that each work node has a default OvS switch called br-int. However, I can't find br-int on the work node when I type the command "ip add". Here is the "ip add" command result: t1@K8s-WorkNode-1:~$ ip add 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:6b:fb:c0 brd ff:ff:ff:ff:ff:ff inet 192.168.0.155/16 brd 192.168.255.255 scope global ens3 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe6b:fbc0/64 scope link valid_lft forever preferred_lft forever 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:32:d8:05:12 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 valid_lft forever preferred_lft forever 5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether da:cf:0a:ea:6a:93 brd ff:ff:ff:ff:ff:ff 6: gw0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 62:84:f8:bd:e9:27 brd ff:ff:ff:ff:ff:ff inet 10.221.1.1/24 brd 10.221.1.255 scope global gw0 valid_lft forever preferred_lft forever inet6 fe80::6084:f8ff:febd:e927/64 scope link valid_lft forever preferred_lft forever 7: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether 5e:9a:05:56:96:6d brd ff:ff:ff:ff:ff:ff inet6 fe80::5c9a:5ff:fe56:966d/64 scope link valid_lft forever preferred_lft forever

All in all, how can I find the OvS (br-int) on the work node and set my SDN controller for it?

tnqn commented 4 years ago

Hi @magaboomchen, thanks for using Antrea! The reason you didn't see br-int as an interface should be antrea-agent just created the bridge but not the internal port with same name. And as antrea-agent is not using the default ovs db file path /var/run/openvswitch/db.sock, you need to specify the db path explicitly to operate OVS, for example:

ovs-vsctl --db unix:/var/run/antrea/openvswitch/db.sock show
ovs-ofctl show unix:/var/run/antrea/openvswitch/br-int.mgmt

Details can be found https://github.com/vmware-tanzu/antrea/blob/master/docs/troubleshooting.md#debugging-ovs Please let me know if it resolves your problem.

magaboomchen commented 4 years ago

Hi @tnqn , thank you for your help! Now I can find the br-int on work node and connect br-int to SDN controller (RYU) on master node successfully. However, it seems that SDN controller (RYU) and antrea-agent can't coexist. Here is my observation: I run my app (an app which listens packet-in event and do nothing) in RYU. After connecting to RYU, all default flow entries of br-int (set by antrea-agent) disappear. And br-int also appears when I type the "ip add" command on work node. As a result, I can't ping gw0 on work node from master node.

This situation isn't what I expect. I want antrea-agent controls the basic network of K8s(i.e. I want keep all flow entries set by antrea-agent), and let RYU add/del additional flow entries.

I make some effort and find something maybe useful: It seems that OvS clear its flow table when it connect to a controller from "no controller mode". (https://mail.openvswitch.org/pipermail/ovs-discuss/2014-May/033712.html)

So, it seems that I can't add new flow entries on br-int from RYU because I also want keep original entries set by antrea-agent. Are there some ways that I can add/del flow entries in br-int from an remote server? I want implement a centralized controller and realize Service Function Chain.

antoninbas commented 4 years ago

I think that's the expected behavior with the current implementation. We actually do not set a primary controller for the br-int bridge, instead we connect to the mgmt Unix Domain Socket (similar to what ovs-ofctl does). Maybe this is something that we can investigate. @wenyingd do you think we should set the controller for br-int (equivalent of ovs-vsctl br-int set-controller unix:/var/run/...) instead of using the service socket? What would be the practical implications compared to the current approach?

@magaboomchen While we investigate this, maybe you could try taking the approach suggested in the email thread. I think you could do that experiment without modifying any Antrea Go code: 1) delete Antrea from the cluster (this should not delete br-int on the Nodes), 2) set the controller for every bridge using ovs-vsctl br-int set-controller <addr>, 3) deploy Antrea in your cluster, 4) start your Ryu controller. At least, according to the email thread, the Antrea flows should not be wiped out in this case, since we put the switch in a "at least one controller" configuration before installing the Antrea flows.

wenyingd commented 4 years ago

@magaboomchen I didn't get your point much, do you mean you want to control br-int by another SDN controller(RYU) and antrea-agent at the same time, but the flow entries installed by antrea-agent is lost? Have you checked if RYU has logics to delete existing flows when the OFSwitch is connected? In my mind, some SDN controller has similar logics by default.

Actually, antrea-agent is a kind of OF contoller, but it doesn't listen at a TCP port, but initiates a Unix Domain Socket connection to br-int. In this UDS connection, antrea-agent is working as the client, and OVS is the server. Besides, this UDS connection is also shared with ovs-ofctl command, and OVS has listened on the connection by default.

@antoninbas , if we change to ovs-vsctl br-int set-controller unix:/var/run/..., it means the br-int should initiate a connection(it must be another connection but not the one ovs-ofctl is using), and antrea-agent needs to work as the Server waiting for the connection. I don't think it is the expected working mode for Antrea. If the requirement is to ensure br-int has Openflow entries from both Antrea-agent and other controller(not sure if it is a valid requirement or not), maybe we could think of supporting resync Openflow entries from antrea-agent to OVS. But the difficulty should be how to avoid removing the flows installed the other controller by mistake?

magaboomchen commented 4 years ago

@antoninbas Thank you for your help! I think I can't delete Antrea for the following reason: In the future I will install a lot of servers, and some of them will be turned on/off dynamically to simulate hardware failures. If a server rejoin the cluster, I have to delete Antrea and all the other servers maybe disconnect (But I don't want them disconnect).

I come up with another approach to control br-int, I can execute commands in container from master node: kubectl exec -n kube-system antrea-agent-8ccjj -c antrea-ovs ovs-vsctl show I test this method and it work fine until now. Next, I can write a program to do this work automatically.

@wenyingd Thank you for your help! Yes, I want to control br-int by another SDN controller(RYU) and antrea-agent at the same time. I investigate and find that floodlight will clear the flow table at initial but RYU won't. Fortunately, I think I find another way to realize my purpose.

By the way, I think it may be useful to set an additional SDN controller for users to control their customized network. While, it seems that user could make misoperation and delete original flow installed by antrea-agent.

Thanks a lot!

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days