Closed aderumier closed 3 years ago
Alexandre: can you enable BGP debug logs and report them ? Can you run the same commands without summary ?
Any other informations are welcomed too: zebra, kernel, etc... even if it does not really mater here.
here the debug logs
frr 4.1.log; https://gist.github.com/aderumier/f916b21c803f0de5c0283ed0d3375b56
frr 5.0.log; https://gist.github.com/aderumier/dbf3d0760fccb9b661057869086ab739
# show bgp l2vpn evpn
Route Distinguisher: ip 10.59.100.231:2
*> [5]:[0]:[24]:[172.16.0.0]
10.59.100.231 0 32768 ?
*> [5]:[0]:[24]:[192.168.0.0]
10.59.100.231 0 32768 ?
*> [5]:[0]:[24]:[192.168.1.0]
10.59.100.231 0 32768 ?
Route Distinguisher: ip 10.59.100.231:3
*> [2]:[0]:[48]:[b2:80:29:35:0e:1c]
10.59.100.231 32768 i
*> [2]:[0]:[48]:[b2:80:29:35:0e:1c]:[32]:[192.168.0.10]
10.59.100.231 32768 i
*> [2]:[0]:[48]:[b2:80:29:35:0e:1c]:[128]:[fe80::b080:29ff:fe35:e1c]
10.59.100.231 32768 i
*> [3]:[0]:[32]:[10.59.100.231]
10.59.100.231 32768 i
Route Distinguisher: ip 10.59.100.231:4
*> [3]:[0]:[32]:[10.59.100.231]
10.59.100.231 32768 i
Displayed 8 out of 8 total prefixes
# show bgp l2vpn evpn
Route Distinguisher: ip 10.59.100.231:2
*> [5]:[0]:[24]:[172.16.0.0]
10.59.100.231 0 32768 ?
*> [5]:[0]:[24]:[192.168.0.0]
10.59.100.231 0 32768 ?
*> [5]:[0]:[24]:[192.168.1.0]
10.59.100.231 0 32768 ?
Route Distinguisher: ip 10.59.100.231:3
*> [2]:[0]:[48]:[b2:80:29:35:0e:1c]
10.59.100.231 32768 i
*> [2]:[0]:[48]:[b2:80:29:35:0e:1c]:[32]:[192.168.0.10]
10.59.100.231 32768 i
*> [2]:[0]:[48]:[b2:80:29:35:0e:1c]:[128]:[fe80::b080:29ff:fe35:e1c]
10.59.100.231 32768 i
*> [3]:[0]:[32]:[10.59.100.231]
10.59.100.231 32768 i
Route Distinguisher: ip 10.59.100.231:4
*> [3]:[0]:[32]:[10.59.100.231]
10.59.100.231 32768 i
Route Distinguisher: ip 10.59.100.232:2
*>i[5]:[0]:[24]:[172.16.0.0]
10.59.100.232 0 100 0 ?
*>i[5]:[0]:[24]:[192.168.0.0]
10.59.100.232 0 100 0 ?
*>i[5]:[0]:[24]:[192.168.1.0]
10.59.100.232 0 100 0 ?
Route Distinguisher: ip 10.59.100.232:3
*>i[3]:[0]:[32]:[10.59.100.232]
10.59.100.232 100 0 i
Route Distinguisher: ip 10.59.100.232:4
*>i[2]:[0]:[48]:[b2:66:43:60:b7:50]
10.59.100.232 100 0 i
*>i[2]:[0]:[48]:[b2:66:43:60:b7:50]:[32]:[192.168.1.11]
10.59.100.232 100 0 i
*>i[2]:[0]:[48]:[b2:66:43:60:b7:50]:[128]:[fe80::b066:43ff:fe60:b750]
10.59.100.232 100 0 i
*>i[3]:[0]:[32]:[10.59.100.232]
10.59.100.232 100 0 i
Displayed 16 out of 16 total prefixes
os is a debian 9.0 with kernel 4.15,
sysctl tuning:
net.ipv4.tcp_l3mdev_accept=1 net.ipv4.conf.default.rp_filter=0 net.ipv4.conf.all.rp_filter=0 net.ipv4.ip_forward=1 net.ipv6.conf.all.forwarding=1
testing with 2 hosts peering together
host1 : /etc/network/interfaces
auto eno1.100
iface eno1.100
address 10.59.100.231
netmask 255.255.255.0
gateway 10.59.100.1
auto eno2.100
iface eno2.100
address 172.16.0.1
netmask 255.255.255.0
vrf vrf1
auto vmbr2
iface vmbr2
address 192.168.0.1/24
bridge_ports vxlan2
bridge_stp off
bridge_fd 0
hwaddress 44:39:39:FF:40:94
vrf vrf1
auto vxlan3
iface vxlan3 inet manual
vxlan-id 3
vxlan-local-tunnelip 10.59.100.231
bridge-learning off
bridge-arp-nd-suppress on
bridge-unicast-flood off
bridge-multicast-flood off
auto vmbr3
iface vmbr3
address 192.168.1.1/24
bridge_ports vxlan3
bridge_stp off
bridge_fd 0
hwaddress 44:39:39:FF:40:94
vrf vrf1
#interconnect vxlan-vfr l3vni
auto vxlan4001
iface vxlan4001
vxlan-id 4001
vxlan-local-tunnelip 10.59.100.231
bridge-learning off
bridge-arp-nd-suppress on
bridge-unicast-flood off
bridge-multicast-flood off
auto vmbr4001
iface vmbr4001
bridge_ports vxlan4001
bridge_stp off
bridge_fd 0
hwaddress 44:39:39:FF:40:90
vrf vrf1
auto vrf1
iface vrf1
vrf-table auto
host2 : /etc/network/interfaces
auto eno1.100
iface eno1.100
address 10.59.100.232
netmask 255.255.255.0
gateway 10.59.100.1
auto eno2.100
iface eno2.100
address 172.16.0.2
netmask 255.255.255.0
vrf vrf1
auto vxlan2
iface vxlan2 inet manual
vxlan-id 2
vxlan-local-tunnelip 10.59.100.232
bridge-learning off
bridge-arp-nd-suppress on
bridge-unicast-flood off
bridge-multicast-flood off
auto vmbr2
iface vmbr2
address 192.168.0.1/24
bridge_ports vxlan2
bridge_stp off
bridge_fd 0
hwaddress 44:39:39:FF:40:94
vrf vrf1
auto vxlan3
iface vxlan3 inet manual
vxlan-id 3
vxlan-local-tunnelip 10.59.100.232
bridge-learning off
bridge-arp-nd-suppress on
bridge-unicast-flood off
bridge-multicast-flood off
auto vmbr3
iface vmbr3
address 192.168.1.1/24
bridge_ports vxlan3
bridge_stp off
bridge_fd 0
hwaddress 44:39:39:FF:40:94
vrf vrf1
#interconnect vxlan-vfr l3vni
auto vxlan4001
iface vxlan4001
vxlan-id 4001
vxlan-local-tunnelip 10.59.100.232
bridge-learning off
bridge-arp-nd-suppress on
bridge-unicast-flood off
bridge-multicast-flood off
auto vmbr4001
iface vmbr4001
bridge_ports vxlan4001
bridge_stp off
bridge_fd 0
hwaddress 44:39:39:FF:40:91
vrf vrf1
auto vrf1
iface vrf1
vrf-table auto
I have tested the frr-5.0-dev branch, and it's working fine. I'll try to bisect, but it seem to be recent.
I have find the commit: https://github.com/FRRouting/frr/commit/7e0c80ea1c526903d4b67dabddc9430c3aab8d65
from this pull request https://github.com/FRRouting/frr/commit/f89270226297ec1f1a8290481d1dc7fb66d71422
since this, it doest't work anymore
Please try setting: net.ipv4.tcp_l3mdev_accept=0
Also what kernel rev are you running?
@rwestphal didn't you find that there was an kernel version that didn't work with this change? What happened with that?
@louberger
net.ipv4.tcp_l3mdev_accept=0 -> doesn't help
I'm using 4.15.17 kernel. (I can test other kernels if you want)
@louberger yes, I had the exact same problem in the past week. bgpd is having issues after commit 7e0c80e, but only when using recent Linux kernels (apparently v4.14+).
I made this topology to illustrate the problem: https://gist.github.com/rwestphal/545473123cd967f73dc52872ed37c2dc
Please see the output below:
# vtysh -c "show ip bgp vrf all summary"
Instance Default:
IPv4 Unicast Summary:
BGP router identifier 10.0.0.1, local AS number 1 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 21 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.0.0.2 4 1 0 0 0 0 0 never Active
Total number of neighbors 1
Instance rt1-RED:
IPv4 Unicast Summary:
BGP router identifier 10.0.1.1, local AS number 1 vrf-id 2
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 21 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.0.1.2 4 1 20 20 0 0 0 00:17:21 0
Total number of neighbors 1
Instance rt1-BLUE:
IPv4 Unicast Summary:
BGP router identifier 10.0.2.1, local AS number 1 vrf-id 3
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 21 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.0.2.2 4 1 20 20 0 0 0 00:17:21 0
Total number of neighbors 1
# sysctl net.ipv4.tcp_l3mdev_accept
net.ipv4.tcp_l3mdev_accept = 0
In short, when BGP is enabled in one or more VRFs, the BGP instance running on the default VRF is affected and can't establish a TCP connection to the remotes peer anymore.
As we can see below, bgpd opens the expected TCP sockets normally, but the kernel for some reason is sending TCP RSTs after completing the TCP handshake:
# ss -tan4 | grep 179
LISTEN 0 128 *%rt1-BLUE:179 *:*
LISTEN 0 128 *%rt1-RED:179 *:*
LISTEN 0 128 *:179 *:*
SYN-RECV 0 0 10.0.0.1%rt1-BLUE:179 10.0.0.2:42230
ESTAB 0 0 10.0.2.1%rt1-BLUE:52138 10.0.2.2:179
ESTAB 0 0 10.0.1.1%rt1-RED:38158 10.0.1.2:179
The bgpd log file shows lots of this:
2018/06/17 12:43:12 BGP: 10.0.1.2 [FSM] Timer (keepalive timer expire)
2018/06/17 12:43:12 BGP: 10.0.2.2 [FSM] Timer (keepalive timer expire)
2018/06/17 12:43:13 BGP: 10.0.0.2 [FSM] Timer (connect timer expire)
2018/06/17 12:43:13 BGP: 10.0.0.2 [FSM] ConnectRetry_timer_expired (Active->Connect), fd -1
2018/06/17 12:43:13 BGP: 10.0.0.2 [Event] Connect start to 10.0.0.2 fd 28
2018/06/17 12:43:13 BGP: 10.0.0.2 [FSM] Non blocking connect waiting result, fd 28
2018/06/17 12:43:13 BGP: 10.0.0.2 went from Active to Connect
2018/06/17 12:43:13 BGP: 10.0.0.2 [Event] Connect failed 104(Connection reset by peer)
2018/06/17 12:43:13 BGP: 10.0.0.2 [FSM] TCP_connection_open_failed (Connect->Active), fd 28
2018/06/17 12:43:13 BGP: 10.0.0.2 went from Connect to Active
2018/06/17 12:44:12 BGP: 10.0.1.2 [FSM] Timer (keepalive timer expire)
2018/06/17 12:44:12 BGP: 10.0.2.2 [FSM] Timer (keepalive timer expire)
Using kernel v4.12, the topology above works normally, so I'm afraid this might be a bug introduced recently in the Linux kernel. Once I have some time I'll try to do a git bisect and find the offending commit. For now the workaround is to either a) use an older kernel or b) revert commit 7e0c80e.
I have tested with kernel 4.13.16, it's working fine. so it must be same bug.
I have a speculative workaround that I have in mind. Are you willing to try it?
On June 17, 2018 12:07:19 PM alexandre derumier notifications@github.com wrote:
I have tested with kernel 4.13.16, it's working fine. so it must be same bug.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/FRRouting/frr/issues/2460#issuecomment-397888871
If we are having kernel version issues, we should get a kernel person involved to make sure nothing more serious is going on.
tested kernel 4.14.0, don't work.
Edited : 4.14rc1 don't work. 4.17 don't work
@louberger : I have time to test tomorrow if needed.
please see if #2475 fixes your issue (with net.ipv4.tcp_l3mdev_accept=1)
@louberger
Thanks, #2475 fix it for me (kernel >= 4.14 + net.ipv4.tcp_l3mdev_accept=1).
works also on 4.13 kernel, with or without net.ipv4.tcp_l3mdev_accept=1
Thank you for the test results! While this is a good change to have for the long term, we should also get with the kernel folks to understand what happened in 4.14...
On June 18, 2018 12:03:45 AM alexandre derumier notifications@github.com wrote:
@louberger
Thanks, #2475 fix it for me (kernel >= 4.14 + net.ipv4.tcp_l3mdev_accept=1).
works also on 4.13 kernel, with or without net.ipv4.tcp_l3mdev_accept=1
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/FRRouting/frr/issues/2460#issuecomment-397936695
@aderumier those seem to be good guesses.
So, I've managed to reproduce the same problem using a simple netcat-like program, which confirms this is a kernel issue.
One interesting thing is this: if the TCP socket for the default VRF is the last one to be opened, then everthing works perfectly.
However, if you change your config from this:
router bgp 1234
[snip]
!
router bgp 1234 vrf vrf1
[snip]
!
To this:
router bgp 1234 vrf vrf1
[snip]
!
router bgp 1234
[snip]
!
Nothing will change because the VRF sockets are created only after bgpd establishes a connection to zebra. So the workaround would be to configure the main BGP instance using vtysh or telnet after the BGP VRF instances are configured.
If we are having kernel version issues, we should get a kernel person involved to make sure nothing more serious is going on.
Definitely a good idea :)
said kernel person is here .... I am missing something about the problem: are you saying bgpd has per-VRF sockets (a socket bound to each VRF bgp is configured to use) AND a global (not bound to anything) socket?
After discussions on slack, This is a kernel issue introduced in 4.14. And has put forward a fix for this issue. We now need to get this back ported(in progress).
The workaround while we are waiting is to do this:
@donaldsharp @louberger
Hi, do we have some news of kernel dev about this bug ? Any reference of the kernel bug ?
It should be in a forthcoming version of the kernel
https://patchwork.ozlabs.org/patch/931179/
4.14.57 https://lkml.org/lkml/2018/7/20/544
4.17.9 https://lore.kernel.org/patchwork/patch/965438/
Lou
On 8/12/2018 7:29 AM, alexandre derumier wrote:
@donaldsharp https://github.com/donaldsharp @louberger https://github.com/louberger
Hi, do we have some news of kernel dev about this bug ? Any reference of the kernel bug ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FRRouting/frr/issues/2460#issuecomment-412336443, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRWyMh5Ffp7M2lOdE9luqh1P9sDIQzcks5uQBGrgaJpZM4Upj5F.
@aderumier Hi, we had a similar issue, with a L3VPN setup using Docker containers, having BGP sessions not established despite neighbors doing the bind/listen/accept correctly because the VRF was not doing the forwarding (without containers it worked ok). In our case, using Ubuntu 16.04 in a test environment, the bug was there until kernel 4.15.0-45, being fixed with kernel 4.15.0-46 (4.15.0-46 changelog). Without containers it worked OK (with 4.15.0-45 we tried "privileged" and "super privileged" containers, no luck either). So may be there were many VRF corner cases affecting different things. Hope it helps :-)
Hi, I can't exchange evpn routes in vrf anymore since 5.0. It was working fine last month in 4.1-dev. (don't remember exactly when).
with this simple config:
frr 4.1-dev (around last month)
frr5.0 (stable branch or 5.0 tag)
"show bgp evpn route" only display local routes, but don't see routes from neighbor.
This only happen inside vrf, without vrf it's working fine.