Closed smangelkramer closed 5 years ago
See a small blogpost from my (private, thus as pdf and not as link) cloud blog describing this topic more precise:
@smangelkramer +1. We have converted our "traditional" Access Network to a BGP / EVPN setup, just like our (IPv6 only) storage network (Ceph). We totally agree with you this is the way to go for "ISP" scale networking. We have a similar PoC as you but with some differences:
We have not yet worked on the OpenNebula part yet. Most of the "moving" parts are in the "drivers" I guess. The(OpenvSwitch) VXLAN drivers might be a could starting point to hack on. Personally I like OpenvSwitch more than legacy bridging, but the concept stays the same.
@hydro-b i totally agree with you. for "classic" deployments i favor OVS too. The docker part (Cumulus was just for my PoC - of course a Cisco/Juniper will work too ;-) ). I will have a look at FRR. THX.
This looks super-cool! Need to go deeper on the post, but a quick question about the integration.
We currently have the VXLAN drivers that do the ip link add vxla...
you can also pass the additional parameters as configuration options vtep ... with ip_link_conf: as described in the conf file.
So your idea is to develop drivers to take care of the BGP configuration?
@smangelkramer "of course a Cisco/Juniper will work too ;-)" ... Arista in our case ;-). @rsmontero @smangelkramer maybe we can combine forces here? How cool would it be to present this at the upcoming OpenNebula Conf in Amsterdam?
Please, do. It would be a beautiful addition to OpenNebula, and a very interesting talk.
Yep I'm totally in. @hydro-b if you have a shopping list for what it is missing in OpenNebula right now to better support your use case would be great. We can start working from there and see if we are covering everything
@rsmontero Cool! We will go ahead and let you know what it would take to integrate this in OpenNebula.
@hydro-b "yes I do" - this would be really nice. tino and me tried this the last years, when i built up a german ISP datacenter/cloud. but now it looks better :-)
And of course: let`s combine the forces. Sounds very good!
@hydro-b shall we set up a short webex for syncing us? i`ve seen the talk at OpenNebula Conf 2018 - if you are interested we can combine forces here too.
Hi,
Good idea. I'm currently on holiday until August 14. Heads-up: we have a PoC up-and-running. Now it's time to adjust the ONE drivers for it.
Gr. Stefan
Sebastian Mangelkramer notifications@github.com schreef op 2 augustus 2018 11:29:14 CEST:
@hydro-b shall we set up a short webex for syncing us? i`ve seen the talk at OpenNebula Conf 2018 - if you are interested we can combine forces here too.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/OpenNebula/one/issues/2161#issuecomment-409865495
Hi Stefan,
i`m on holliday until August 20. I also have two PoC (one at customer side in production) up and running.
Enjoy your hollidays!
Best Sebastian
Am 2018-08-02 12:55, schrieb hydro-b:
Hi,
Good idea. I'm currently on holiday until August 14. Heads-up: we have a PoC up-and-running. Now it's time to adjust the ONE drivers for it.
Gr. Stefan
Sebastian Mangelkramer notifications@github.com schreef op 2 augustus 2018 11:29:14 CEST:
@hydro-b shall we set up a short webex for syncing us? i`ve seen the talk at OpenNebula Conf 2018 - if you are interested we can combine forces here too.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/OpenNebula/one/issues/2161#issuecomment-409865495
-- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [1], or mute the thread [2].
[1] https://github.com/OpenNebula/one/issues/2161#issuecomment-409887717 [2] https://github.com/notifications/unsubscribe-auth/ARKyeIa0YLmUgzZYyLJson_ERSXIJgoJks5uMtqcgaJpZM4UeYXn
On Thu, Aug 02, 2018 at 04:19:23AM -0700, Sebastian Mangelkramer wrote:
Hi Stefan,
i`m on holliday until August 20. I also have two PoC (one at customer side in production) up and running.
Enjoy your hollidays!
I got pulled away from this. Fortunately my colleague at NOC got time to fine tune the PoC.
We are using the linux "bridge" instead of openvswitch. FRR does not recognize a port as "connected" when openvswitch bridges are used. We use "ifupdown2" package (Cumulus) to configure host networking.
When everything is setup on the host, this is what OpenNebula should do when a VM is spawned, example with vxlan 10601 and bridge br-vlan601:
during VM spawning) #
#
@Sebastian: Have you modified scripts in OpenNebula? Or did you set up a couple of vlans by hand?
Gr. Stefan
Hi,
Just had the opportunity to setup a little PoC with FRR, I've configured the hosts to send routes of all the vni in the hosts and a route reflector to distribute them.
The only change needed to the vxlan driver is to drop the multicas group (I think we can do that when the mc base address is 0.0.0.0). The "nolearning" option is already supported in the configuration files with the ip link options.
Hi,
I have teamed up with @hydro-b , to get this PoC going on our production network. I'd like to elaborate some more on the CLI output he posted yesterday.
We have a full L2+L3 symmetric IRB EVPN VXLAN multi tenant setup now running and it's looking good. Looking forward to put in production.
# Create a vxlan interface and set VTEP source to loopback so it will always remain up. Underlay
# routing is responsible for reachability. Please take note to the following options:
# "nolearning" (as BGP will supply mac-ip information, so disable data-plane learning)
# "proxy" (suppress arp requests if mac-ip is locally present and reply to arp)
# "srcport 49152 65535" and "dstport 4789" as per rfc requirements/advice.
ip link add vxlan10601 type vxlan id 10601 proxy nolearning srcport 49152 65535 dstport 4789 local loopback_ip
ip link set vxlan10601 up
# Create bridge to later attach VTEP and VM net:
brctl addbr br-vlan601
brctl stp br-vlan601 off
brctl setfd br-vlan601 0
# Next 3 steps are only necessary if you want L3VNI IRB setup. We assume that vrf tenantVrf
# is already active on hypervisor(s). For L2VNI not needed
ip link set br-vlan601 master tenantVrf
ip addr add router_ip_on_the_hosts dev br-vlan601
ip link set dev br-vlan601 address 02:62:69:74:67:77
# attach VTEP to bridge
brctl addif br-vlan601 vxlan10601
# and tune bridge options if needed. With flood set off broadcast are suprressed, which may result
# in undesired behaviour.
bridge link set dev vxlan10601 learning off
bridge link set dev vxlan10601 neigh_suppress on
bridge link set dev vxlan10601 flood off
bridge link set dev vxlan10601 mcast_flood off
# Set bridge up and if setup correctly, traffic will flow.
ip link set br-vlan601 up
And our frr.conf:
vrf tenantVrf
vni 20003
exit-vrf
!
router bgp 65101
bgp router-id 213.136.24.1
no bgp default ipv4-unicast
neighbor 213.136.2.2 remote-as 65101
neighbor 213.136.2.2 update-source 213.136.24.130
neighbor 213.136.2.2 capability extended-nexthop
!
address-family l2vpn evpn
neighbor 213.136.2.2 activate
vni 20003
rd 213.136.24.1:20003
route-target import 213.136.2.48:20003
route-target export 213.136.2.48:20003
exit-vni
vni 10601
rd 213.136.24.1:10601
route-target import 213.136.2.48:10601
route-target export 213.136.2.48:10601
exit-vni
advertise-all-vni
advertise-default-gw
advertise ipv4 unicast
advertise ipv6 unicast
exit-address-family
!
router bgp 65101 vrf internetVrf
bgp router-id 213.136.24.1
no bgp default ipv4-unicast
!
address-family ipv4 unicast
redistribute connected
exit-address-family
!
address-family ipv6 unicast
redistribute connected
exit-address-family
!
address-family l2vpn evpn
vni 20003
exit-vni
advertise-all-vni
advertise ipv4 unicast
advertise ipv6 unicast
rd 213.136.24.1:20003
route-target import 213.136.2.48:20003
route-target export 213.136.2.48:20003
exit-address-family
!
end
@hydro-b yes, i modified the scripts (OpenNebula.exec_and_log). but i'm just using L2VNI. Also i only use it with standard linux bridges and not OVS. Atm i`m looking for FRR, replacing my CumulusLinux (running inside of Docker Containers on the Hosts for my PoC).
@Adze1502 your L3VNI IRB looks fine - especially for isolation the tenants inside VRFs.
Show I'll summarize my setup and the changes needed in current OpenNebula Drivers:
Using stock vxlan drivers with the following change:
OpenNebula.exec_and_log("#{command(:ip)} link add #{@nic[@attr_vlan_dev]}"\
! " #{mtu} type vxlan id #{@nic[@attr_vlan_id]} #{ttl}"\
" dev #{@nic[:phydev]} #{ip_link_conf}")
OpenNebula.exec_and_log("#{command(:ip)} link set #{@nic[@attr_vlan_dev]} up")
--- 50,56 ----
end
OpenNebula.exec_and_log("#{command(:ip)} link add #{@nic[@attr_vlan_dev]}"\
! " #{mtu} type vxlan id #{@nic[@attr_vlan_id]} group #{mcs} #{ttl}"\
" dev #{@nic[:phydev]} #{ip_link_conf}")
/var/lib/one/remotes/etc/vnm/OpenNebulaNetwork.conf
:
:ip_link_conf:
:nolearning:
router bgp 7675
bgp router-id 10.4.4.11
no bgp default ipv4-unicast
neighbor 10.4.4.13 remote-as 7675
neighbor 10.4.4.13 capability extended-nexthop
address-family l2vpn evpn
neighbor 10.4.4.13 activate
advertise-all-vni
exit-address-family
exit
router bgp 7675
bgp router-id 10.4.4.13
bgp cluster-id 10.4.4.13
no bgp default ipv4-unicast
neighbor kvm_hosts peer-group
neighbor kvm_hosts remote-as 7675
neighbor kvm_hosts capability extended-nexthop
neighbor kvm_hosts update-source 10.4.4.13
bgp listen range 10.4.4.0/24 peer-group kvm_hosts
address-family l2vpn evpn
neighbor fabric activate
neighbor fabric route-reflector-client
exit-address-family
exit
10.4.4.11# show bgp evpn route
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 10.4.4.11:2
*> [2]:[0]:[0]:[48]:[02:00:0a:03:03:c9]
10.4.4.11 32768 i
*> [3]:[0]:[32]:[10.4.4.11]
10.4.4.11 32768 i
Route Distinguisher: 10.4.4.12:2
*>i[2]:[0]:[0]:[48]:[02:00:0a:03:03:c8]
10.4.4.12 0 100 0 i
*>i[3]:[0]:[32]:[10.4.4.12]
10.4.4.12 0 100 0 i
Considering the current drivers I propose as part of this issue:
/var/lib/one/remotes/etc/vnm/OpenNebulaNetwork.conf
(basically make the patch used configurable)What do you think?
@rsmontero :+1: Nice.
One question though. I have added the following to /var/lib/one/remotes/etc/vnm/OpenNebulaNetwork.conf:
:ip_link_conf:
:nolearning:
:proxy:
:srcport: 49152 65535
:dstport: 4789
But how would one add the option ":local: ip_address", to define VTEP source address as this differs per hypervisor. In our setup this will be the loopback ip address?
edit: nevermind.. sigh.. found it. (physical device option)
@rsmontero +1
What about to define a "best practice" or "recommended" implementation of VXLAN with BGP-EVPN describing the whole ecosystem with it`s components like: OpenNebula, KVM hosts (and the BGP / Networking Stack on this hosts)? A part of the OpenNebula documentation?
The northbound BGP part could be let open - just mention the used standards, rfc's,...
My first PoC is also based on the stock VXLAN drivers. In another PoC (running at a customer) the Bridges and VXLAN VNIs, VLANs was created by hand.
Maybe this is not the place to discuss this, but how would one configure this on the "Virtual Network" tab?
I just want a bridge with just one vtep(vxlan) interface. The options "VLAN ID" or "Physical device" make no sense in our setup as everything is layer3 routed out the hypervisor, but they are required fields. The closest i came is setting "network mode" to Bridged with "Physical device" set to vxlan
@smangelkramer Totally, out of this I'd plan to add a section in vxlan to hint this setup.
@Adze1502 you are right the ability to include some dynamic values in the ip link options is not currently. We can add that feature as part of this feature in a template like form:
:ip_link_conf:
:local: $IP
so $IP is replaced by the IP of the host, two comments:
VXLAN drivers:
PHYDEV in OpenNebula BRIDGE in OpenNebula
+------------+ +------------+ +-----------+
| eth0 +--------->+ vxlan link +------+ Linux |
| routable_ip| | vni | | Bridge |
+------------+ +------------+ ++---+---+--+
| | |
Uses dev PHYDEV in | | |
ip link + + +
Virtual Machines in vxlan
Your setup:
+---------------+ +-------------+
| +------+ vxlan link |
| Linux Bridge | | vni |
| routeble_ip | +-------------+
| |
+-+--+---+------+
| | |
| | |
+ + +
Virtual Machines in vxlan
So I'd like to understand if the original setup (VXLAN drivers) can be used in your deployment (note that you could use any existing link in the host, and so maybe you can drop de need of using local in your ip link command)? and if not why? I'm asking this to check if the vxlan drivers are not considering a relevant use case here.
@rsmontero Thnx for this insight.
- What other variables would be nice to have
A configurable VNI ID. It is more than just a nice to have. This could be an optional or automatic assigned field, but in our setup we require manual setting of VNI ID. Our VNI policy is a function of VNI_ID = (Tenant_ID * 10000) + VLAN_ID. For L3VNI's (vrf) the VLAN_ID in this function is set to 1. Hardware VTEP's connect customers bare-metal servers with VM's on our hypervisors.
- How would we get the "local ip", I mean the host would have potentially multiple interfaces, would it better have some thing like an interface name? or probably $IP may refer to the IP of the PHYDEV in the network or BRIDGE?
In our hypervisor PoC setup it is a loopback device. This is an exact copy of how our Arista switches do it in production. As a loopback device is always up, lower layer disrupts (e.g. link failure) have no impact on the overlay networks. We just need redundant layer3 connectivity between the different hypervisor/switch loopback (vtep) interfaces. I would suggest IP of a PHYDEV as "local ip".
- About your L3 requirements, this is I think deeper than Sunstone interface. The vxlan drivers assumes that the traffic is encapsulated through an existing interface with an IP configure, in your case you are adding the IP to the bridge.
I have to test some more with the standard VXLAN drivers, so will let you know. Maybe a dummy interface can serve as a routable_ip PHYDEV and use this as clients gateway. The biggest issue i have atm is that standard VXLAN driver requires a vlan_id, which in our setup makes no sense. Multiple tenants can have overlapping vlan id's. That is why i'd like to create a bridge with arbitrary name and only attach VM's and VTEP to this bridge. This setup allows customers to use/keep their own vlan scheme, which makes migrating from bare-metal to VM very easy for our customers.
Will test some more and keep you updated. Thanks!
Edit1: The routable ip on the linux bridge is used as anycast ip gateway for VMs. It is configured in a vrf and It is not used as source address for VXLAN/VTEP interface.
Edit2: Our setup:
+------------------+ +-----------------+
| Linux Bridge +------+ |
| OPTIONAL route- | | Vxlan link |
| able ip in vrf | | local ip from |
| serves only as | | lo iface as src |
| gateway for VMs | | vtep interface |
+-+--+---+---------+ +-----------------+
| | |
| | |
+ + +
Virtual Machines
Did some more research on the standard vxlan driver method and why we cant use it on our PoC EVPN setup. Please bear with me on this one.
The current vxlan driver implementation is based on data-plane mac learning. BUM traffic is sent through multicast. Hence the "dev" option for vxlan interface is needed as hypervisors need to know which interface to use to send out BUM traffic (multicast). The EVPN implementation we are trying to accomplish does not use multicast for BUM traffic, but rather (with the help of of route-type 3 BGP advertisments) creates a replication list. Therefor the dev option is meaningless, as is the group option.
A possible solution could be (and backwards compatible) a drop-down field under VXLAN driver which lets you select if you wish to use the multicast = default or HER/EVPN = new method. The latter would use "local $PHYDEV_IP" + "nolearning" instead of "dev $PHYDEV" + "group" for vxlan interface.
@Adze1502 I think your proposal is perfect, and we'll update the vxlan drivers to have a selector for this two modes.
Regarding the VNI ID, currently it is obtained from the VLAN_ID in the OpenNebula network. I guess you can set VLAN_ID (in OPenNebula) to your (Tenant_ID * 10000) + VLAN_ID
function. Note that you can reuse the VLAN_ID for different networks in OpenNebula.
I was thinking on the situation where the PHYDEV have configured more than one IP. In that case we could add the network used to route the BUM traffic as hint to pick the right IP. But this can be added at later stage if needed.
Thanks for your feedback!!!!!
(I'll update this issue when we have a working driver if case you want to give it a try)
Edit: missing info
Nice! @rsmontero :+1: Looking forward to give it a test run!
Hi @Adze1502
The files are now in master. It can be used directly in 5.6 installation, just replace vxlan.rb file in your front-end (/var/lib/one/remotes/vnm/vxlan/vxlan.rb
) with the repo file at:
https://github.com/OpenNebula/one/blob/master/src/vnm_mad/remotes/vxlan/vxlan.rb
Then add the following to (/var/lib/one/remotes/etc/vnm/OpenNebulaNetwork.conf
)
# Multicast protocol for multi destination BUM traffic. Options:
# - multicast, for IP multicast
# - evpn, for BGP EVPN control plane
:vxlan_mode: evpn
# Tunnel endpoint communication type. Only for evpn vxlan_mode.
# - dev, tunnel endpoint communication is sent to PHYDEV
# - local_ip, first ip addr of PHYDEV is used as address for the communiation
:vxlan_tep: local_ip
# Additional ip link options, uncomment the following to disable learning for
# EVPN mode
:ip_link_conf:
:nolearning:
After that do not forget to run onehost sync -f
to propagate the changes to the nodes.
With the above configuration file the link created is as shown below:
6: eth0.100: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master onebr.100 state UNKNOWN mode DEFAULT group default qlen 1000
link/ether be:31:5e:c2:0b:af brd ff:ff:ff:ff:ff:ff promiscuity 1
vxlan id 100 local 10.4.4.10 srcport 0 0 dstport 8472 nolearning ttl 16 ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx
Note that in this case local 10.4.4.10
is taken from eth0, as the network includes:
....
PHYDEV="eth0"
VLAN_ID="100"
VN_MAD="vxlan"
....
Thnx! wil take it for a test run next week!
We would like to suggest the following changes: https://8n1.org/14016/421c
diff -u vxlan.rb.orig vxlan.rb
--- vxlan.rb.orig 2018-11-15 21:52:53.657420384 +0100
+++ vxlan.rb 2018-11-15 21:25:21.645071227 +0100
@@ -18,6 +18,7 @@
module VXLAN
ATTR_VLAN_ID = :vlan_id
+ ATTR_VNI_ID = :vni_id
ATTR_VLAN_DEV = :vlan_dev
############################################################################
@@ -65,15 +66,15 @@
ip_link_conf << "#{option} #{value} "
end
- OpenNebula.exec_and_log("#{command(:ip)} link add #{@nic[@attr_vlan_dev]}"\
- " #{mtu} type vxlan id #{@nic[@attr_vlan_id]} #{group} #{ttl}"\
+ OpenNebula.exec_and_log("#{command(:ip)} link add vxlan#{@nic[@attr_vni_id]}"\
+ " #{mtu} type vxlan id #{@nic[@attr_vni_id]} #{group} #{ttl}"\
" #{tep} #{ip_link_conf}")
- OpenNebula.exec_and_log("#{command(:ip)} link set #{@nic[@attr_vlan_dev]} up")
+ OpenNebula.exec_and_log("#{command(:ip)} link set vxlan#{@nic[@attr_vni_id]} up")
end
def delete_vlan_dev
- OpenNebula.exec_and_log("#{command(:ip)} link delete #{@nic[@attr_vlan_dev]}")
+ OpenNebula.exec_and_log("#{command(:ip)} link delete vxlan#{@nic[@attr_vni_id]}")
end
def get_interface_vlan(name)
Add attribute "VNI_ID", as that's the VxLAN ID. Use "vxlan$VNI_ID as the vxlan interface.
I added the "VNI_ID" attribute to the VNET template, but the attribute value did not make it through. So, this might need some more changes.
Is this whole patch already implemented in 5.8 please? Can you share link to OpenNebula Conf where you described/presented this feature? BR!
Is this whole patch already implemented in 5.8 please? Can you share link to OpenNebula Conf where you described/presented this feature? BR!
Yes, details here https://docs.opennebula.io/5.8/deployment/open_cloud_networking_setup/vxlan.html#using-vxlan-with-bgp-evpn
You can find the presentation here: https://www.youtube.com/watch?v=kpVrEYBFwZ0
Bug Report
Version of OpenNebula
Component
Description
Expected Behavior
Actual Behavior
How to reproduce
Enhancement Request
Description
From my point of view a modern datacenter fabric has to scale and if based on VXLAN a control plane is one of the secrets to gain scalability and manageability for a resilent and performant multi-tenant datacenter and/or cloud network.
Therefore i did some PoC for adding VXLAN with BGP-EVPN support to OpenNebula. For sure, OpenNebula is the ideal platform to extend and integrate with SDN and other tools (pre,post,clean).
My PoC (now also running in a medium sized cloud environment / datacenter is based on
The advantages over VXLAN with Multicast are:
In my setup BGP EVPN and the whole VXLAN is based on classic Linux bridging (one Bridge per VNI) and Quagga (Cumulusnetworks) inside a Docker container on the VMM hosts.
On the (mostly redundant) OpenNebula frontends there are BGP Route Reflectors installed (also dockerized Quagga).
Example for the RR:
The RR BGP config part:
The VTEP BGP config part:
In the last step the bridges and the VXLAN interfaces has to be created:
Now you`re able to use these bridges as standard "Bridged network" inside OpenNebula, and VXLAN with BGP-EVPN. Of course you had to take care about the MTU for the vNIC (1450 Bytes for example) cause of the VXLAN header and so on. But this works.
It would be really nice to integrate (and optimize) this kind of networking model into OpenNebula. For example to create the networks directly out of OpenNebula. Therefore new
vnm
drivers had to be created and the UI (Sunstone) has to be changed for the new values like VNI, bridge-name, VTEP-IP.Use case
For the moment it
s just an idea. And if i were able to develop i
ll do it. If you need any more informations, access to my lab, etc. - let me know.Interface Changes
Sunstone: - Wizzard for the new network model - VNM driversProgress Status