CumulusNetworks / ifupdown2

GNU General Public License v2.0
161 stars 75 forks source link

bridge.py from cumulus package have a lot of new features vs this repo #47

Closed aderumier closed 6 years ago

aderumier commented 6 years ago

Hi,

I'm looking to implemented ebgp-vpn on a debian host, following this nice presentation https://www.netdevconf.org/2.2/slides/prabhu-linuxbridge-tutorial.pdf (thanks Roopa !)

But it seem that bridge.py from this repo is missing a lot of new features , like "arp-nd-suppress". Is is planned to backport bridge.py from cumulus package ?

julienfortin commented 6 years ago

Hi @aderumier

Yes, we plan to push our Cumulus package upstream and update this repository sometime before summer. As you mentioned we do offer "arp-nd-suppress" via the "bridge-arp-nd-suppress" attribute. Looking at the code now, it seems like it's a "Cumulus Linux specific" feature. We configure this feature via the netlink attribute IFLA_BRPORT_ARP_SUPPRESS, which doesn't seem to be available upstream yet.

@roopa-prabhu should be able to give you more info on that.

Thanks, Julien.

aderumier commented 6 years ago

@julienfortin

Thanks !

Seem that iproute2 4.15 have neigh_suppress on|off, so it should do the trick :)

julienfortin commented 6 years ago

@aderumier, I'll take a look at iproute2 code and consult with @roopa-prabhu!

Thanks!

roopa-prabhu commented 6 years ago

@aderumier , thanks for reaching out. arp nd suppress ifupdown2 support as julien mentions is something he will push to github repo soon. We are testing a version of the patch. Until julien pushes the patch to the github repo..., if you are using an upstream kernel and upstream iproute2, I would recommend configuring it via post-up command: 'bridge link set dev neigh_suppress on ' command. Let us know if it does not work for you. thanks.

aderumier commented 6 years ago

@roopa-prabhu Thanks roopa. Just tested with kernel 4.15, works like a charm :)

Going to test lwtunnel with bridge vlan aware now.

Thanks to cumulus team for the big work on vxlan support. (I'm currently testing implementation of sdn with bgp evpn on hypervisor)

aderumier commented 6 years ago

@julienfortin @roopa-prabhu

do you have a link to kernel patch for IFLA_BRPORT_ARP_SUPPRESS netlink ? (don't known if it's public ? or only cumulus specific).

I'm trying to use the cl3u18 tag on debian with 4.15, but indeed, netlink link_set_add is not working.

I'm not sure if they are other cumulus specific netlink attributes in cumulus kernel ?

aderumier commented 6 years ago

@julienfortin @roopa-prabhu

ok, I see that on upstream kernel, it's IFLA_BRPORT_NEIGH_SUPPRESS, with attribute number 32 instead 152. It's working fine with 32 now.

If I understand, this attributes with high numbers:

IFLA_BRPORT_PEER_LINK           = 150
IFLA_BRPORT_DUAL_LINK           = 151
IFLA_BRPORT_ARP_SUPPRESS      = 152
IFLA_BRPORT_GROUP_FWD_MASKHI    = 153

are custom attributes in cumulus kernel, until they are in upstream kernel ?

julienfortin commented 6 years ago

@aderumier good analysis :) In general, we submit patches, but it takes time for them to be reviewed and applied to the upstream tree. If the patch is posted just after a release you'll have to wait till the next release cycle to see it promoted. But here at Cumulus Networks, we don't wait to apply our patches to our kernel tree (Cumulus Linux) and ship it to our customer in our own release cycle. Once the patch is part of the next upstream release we apply it back to our tree.

I'm focused on ifupdown2 rather than kernel stuff, but you got the idea :)

ljlu1504 commented 5 years ago

hi guys, I try to setup a system like below to verify the neigh_suppress feature on my ubuntu host(I had try both kernel 4.15.0-58 and 5.2.8).

(1) setup vxlan1212 and add to bridge vbm1212

ip link add vxlan1212 type vxlan id 1212 local 192.168.0.2 nolearning brctl addbr vbm1212 brctl addif vbm1212 vxlan1212 ip link set dev vbm1212 up ip link set dev vxlan1212 up

(2) setup eth pair veth1212-veth1212p and add veth1212 to bridge vbm1212

ip link add veth1212 type veth peer name veth1212p ip link set veth1212 up ip netns add ns1212 ip link set netns ns1212 dev veth1212p ip netns exec ns1212 ip link set address 00:00:00:00:01:02 dev veth1212p ip netns exec ns1212 ifconfig veth1212p 192.168.1.1/24 ip netns exec ns1212 ip link set veth1212p up brctl addif vbm1212 veth1212

(3) setup eth pair veth1213-veth1213p and add veth1213 to bridge vbm1212

ip link add veth1213 type veth peer name veth1213p ip link set veth1213 up ip netns add ns1213 ip link set netns ns1213 dev veth1213p ip netns exec ns1213 ip link set address 00:00:00:00:01:03 dev veth1213p ip netns exec ns1213 ifconfig veth1213p 192.168.1.2/24 ip netns exec ns1213 ip link set veth1213p up brctl addif vbm1212 veth1213

(4) setup all bridge link enable neigh_suppress on

bridge link set dev veth1212 neigh_suppress on bridge link set dev vxlan1212 neigh_suppress on bridge link set dev veth1213 neigh_suppress on

(5) add neigh and fdb as below

ip neigh add 192.168.1.2 lladdr 00:00:00:00:01:03 dev vbm1212 ip neigh add 192.168.1.3 lladdr 00:00:00:00:01:04 dev vbm1212 bridge fdb add 00:00:00:00:01:03 dev veth1213 bridge fdb add 00:00:00:00:01:04 dev vxlan1212 dst 192.168.0.3 vni 1212

(6) I flush the ip neigh on ns1212, and then ping 192.168.1.2 on ns1212

ip netns exec ns1212 ip neigh flush 192.168.1.3 ip netns exec ns1212 ip neigh flush 192.168.1.2 ip netns exec ns1212 ping 192.168.1.2

My expectation is the first ARP request will not send to vxlan1212 interface, but actually I saw it.

I am not sure if I have misunderstanding of this feature, I try to find answer in this doc 'https://www.netdevconf.org/2.2/slides/prabhu-linuxbridge-tutorial.pdf', but have no luck.

Could you share the script/command/diagram on how you guys verify this feature?

Thanks a lot for help!

ljlu1504 commented 5 years ago

I had also try set the neigh_suppress on with ifupdown2 command as below, but still can't get expected result. ip link set dev vxlan1212 type bridge_slave neigh_suppress on ip link set dev veth1212 type bridge_slave neigh_suppress on ip link set dev veth1213 type bridge_slave neigh_suppress on

roopa-prabhu commented 5 years ago

@ljlu1504 there is a dynamic debug print in https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/net/bridge/br_arp_nd_proxy.c

Which you can enable to see if it is hitting the neigh proxy code: mount -t debugfs none /sys/kernel/debug echo -n "file br_arp_nd_proxy.c +p" > /sys/kernel/debug/dynamic_debug/control

roopa-prabhu commented 5 years ago

@ljlu1504 From your configuration, it looks like it is not hitting the proxy code because the input port has neigh suppress on: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/net/bridge/br_arp_nd_proxy.c#n155

What are you trying to test ?. In this case seems like you want the bridge to proxy the arp request for ns1213. And both ns1212 and ns1213 are bridged. so i dont see how it will make it to the vxlan port.

If you unset neigh_suppress on veth1212, bridge should proxy it. But depends on what you are trying to test. In vxlan environments neigh_suppress is used to suppress arps going over the vxlan port. But bridge will not flood to vxlan ports here only if the proxy conditions are successful and it really generates a proxy reply. In your case that is not happening either

ljlu1504 commented 5 years ago

@roopa-prabhu , Thanks!

What I want to test is the difference of arp-proxy on vxlan and the neigh suppression on linux bridge/vxlan.

I don't quite understand what's the feature of neigh suppression. My initial understanding of neigh suppression feature is the incoming arp/rarp packet on the specific bridge port if it have 'neigh_suppress on', but based on your above explanation, the 'neigh suppression feature works on vxlan interface only.

so I adjust my test script on two host as below now. Host 1: ip link add vxlan1212 type vxlan id 1212 local 192.168.0.2 nolearning ---> i don't enable the arp proxy, since I don't want test it, does option 'proxy' necessary for 'neigh suppression feature need it? ''' brctl addbr vbm1212 brctl addif vbm1212 vxlan1212 ip link set dev vbm1212 up ip link set dev vxlan1212 up

ip link add veth1212 type veth peer name veth1212p ip link set veth1212 up ip netns add ns1212 ip link set netns ns1212 dev veth1212p ip netns exec ns1212 ip link set address 00:00:00:00:01:02 dev veth1212p ip netns exec ns1212 ifconfig veth1212p 192.168.1.1/24 ip netns exec ns1212 ip link set veth1212p up brctl addif vbm1212 veth1212 ''' Host 2: ''' ip link add vxlan1212 type vxlan id 1212 local 192.168.0.4 nolearning brctl addbr vbm1212 brctl addif vbm1212 vxlan1212 ip link set dev vbm1212 up ip link set dev vxlan1212 up

ip link add veth1212 type veth peer name veth1212p ip link set veth1212 up ip netns add ns1212 ip link set netns ns1212 dev veth1212p ip netns exec ns1212 ip link set address 00:00:00:03:01:02 dev veth1212p ip netns exec ns1212 ifconfig veth1212p 192.168.1.13/24 ip netns exec ns1212 ip link set veth1212p up brctl addif vbm1212 veth1212 ''' After that, I enable 'neigh_suppress' on vxlan1212 interface on host1. ''' bridge link set dev vxlan1212 neigh_suppress on ''' Then I use 'ip netns exec ns1212 ping 192.168.1.12' on host 1 to ping a non-exist IP, My expected result is the ARP request will not appear on the vxlan1212 interface on host2. But i saw it. I am afraid i am not quite understand on what's the exact meaning of the 'neigh suppression' feature. Could you kind explain a bit more?

Which you can enable to see if it is hitting the neigh proxy code: mount -t debugfs none /sys/kernel/debug echo -n "file br_arp_nd_proxy.c +p" > /sys/kernel/debug/dynamic_debug/control As to above debug setting, I already have my 'control' as below, but how should I check the log next to verify if it hitting the neigh proxy code? I didn't see any related log on /var/log/syslog file.

root@i-3uxp0h2m:/sys/kernel/debug/dynamic_debug# cat control | grep arp net/ipv4/arp.c:376 [arp]arpsolicit = "trying to ucast probe in NUD_INVALID\012" drivers/infiniband/core/iwpm_msg.c:756 [iw_cm]iwpm_mapping_infocb = "%s: iWarp Port Mapper (pid = %d) is available!\012" drivers/infiniband/core/iwpm_msg.c:448 [iw_cm]iwpm_register_pidcb = "%s: iWarp Port Mapper (pid = %d) is available!\012" net/bridge/br_arp_nd_proxy.c:345 [bridge]br_nd_send =p "nd send dev %s dst %pI6 dst_hw %pM src %pI6 src_hw %pM\012" net/bridge/br_arp_nd_proxy.c:54 [bridge]br_arp_send =p "arp send dev %s dst %pI4 dst_hw %pM src %pI4 src_hw %pM\012" root@i-3uxp0h2m:/sys/kernel/debug/dynamic_debug#

Thanks a lot for helping!

ljlu1504 commented 5 years ago

I think i figure it out.

VxLAN proxy-arp will work based on ip neigh on VTEP only. ip n replace 192.168.1.24 lladdr 00:01:01:01:01:24 dev vxlan1212---------------> proxy-ary care this only So in order to let Vxlan proxy-arp works fine, it need two conditions:

  1. enable proxy-arp during VTEP createion
  2. the ip neigh on VTEP as above.

While bridge “neigh suppress” will work based on the "ip neigh" and “fdb” on bridge as below: ip n replace 192.168.1.24 lladdr 00:01:01:01:01:24 dev vbm1212 bridge fdb add 00:01:01:01:01:24 master dev vxlan1212 In order to let bridge “neigh suppress” works fine, it need three conditions:

  1. enable “neigh suppress” on VTEP bridge link
  2. have ip neigh on bridge.
  3. have fdb on bridge

Thanks!