FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.12k stars 1.2k forks source link

BFD state machine transition is not competed in single hop mode #5146

Open dineshkumarkamalakannan opened 4 years ago

dineshkumarkamalakannan commented 4 years ago

Looks like when BFD is configured for a VRF, on latest dev/7.2 code, I dont see BFD state doing to "INIT" state on receiving "INIT" message from BGP peer. Instead FRR are just sending "DOWN".


Describe the bug According to state machine of BFD,

download

FRR should be sending "INIT" on receiving "INIT" message from BGP peer.

(put "x" in "[ ]" if you already tried following) [x] Did you check if this is a duplicate issue? [x] Did you test it on the latest FRRouting/frr master branch?

To Reproduce Steps to reproduce the behavior:

  1. FRR configuration:
router bgp 81000 vrf VRF0
 bgp router-id 10.163.0.90
 coalesce-time 1000
 bgp bestpath as-path multipath-relax
 neighbor SPINE-10.163.0.0 peer-group
 neighbor SPINE-10.163.0.0 remote-as 81001
 neighbor SPINE-10.163.0.0 bfd
 neighbor SPINE-10.163.0.0 ebgp-multihop 255
 bgp listen range 10.163.0.0/25 peer-group SPINE-10.163.0.0
!
line vty
!
bfd
 peer 10.163.0.3 local-address 10.163.0.90 vrf VRF0 interface ens192.1100
  no shutdown
 !
 peer 10.163.0.2 local-address 10.163.0.90 vrf VRF0 interface ens192.1100
  no shutdown
 !
!
  1. BFD peers :

    BFD Peers:
    peer 10.163.0.3 local-address 10.163.0.90 vrf VRF0 interface ens192.1100
        ID: 3460682836
        Remote ID: 0
        Status: down
        Downtime: 15 minute(s), 32 second(s)
        Diagnostics: ok
        Remote diagnostics: ok
        Local timers:
            Receive interval: 300ms
            Transmission interval: 300ms
            Echo transmission interval: 50ms
        Remote timers:
            Receive interval: 1000ms
            Transmission interval: 1000ms
            Echo transmission interval: 0ms
    
    peer 10.163.0.3 multihop local-address 10.163.0.90 vrf VRF0
        ID: 2280017251
        Remote ID: 0
        Status: down
        Downtime: 16 minute(s), 43 second(s)
        Diagnostics: ok
        Remote diagnostics: ok
        Local timers:
            Receive interval: 300ms
            Transmission interval: 300ms
            Echo transmission interval: 50ms
        Remote timers:
            Receive interval: 1000ms
            Transmission interval: 1000ms
            Echo transmission interval: 0ms
    
    peer 10.163.0.2 multihop local-address 10.163.0.90 vrf VRF0
        ID: 1990385910
        Remote ID: 0
        Status: down
        Downtime: 16 minute(s), 48 second(s)
        Diagnostics: ok
        Remote diagnostics: ok
        Local timers:
            Receive interval: 300ms
            Transmission interval: 300ms
            Echo transmission interval: 50ms
        Remote timers:
            Receive interval: 1000ms
            Transmission interval: 1000ms
            Echo transmission interval: 0ms
    
    peer 10.163.0.2 local-address 10.163.0.90 vrf VRF0 interface ens192.1100
        ID: 2534851022
        Remote ID: 0
        Status: down
        Downtime: 15 minute(s), 22 second(s)
        Diagnostics: ok
        Remote diagnostics: ok
        Local timers:
            Receive interval: 300ms
            Transmission interval: 300ms
            Echo transmission interval: 50ms
        Remote timers:
            Receive interval: 1000ms
            Transmission interval: 1000ms
            Echo transmission interval: 0ms
  2. TCPDUMP on next hop BGP peer,

13:12:44.008079 IP 10.163.0.2.64855 > 10.163.0.90.3784: BFDv1, Control, State Init, Flags: [none], length: 24
13:12:44.008365 IP 10.163.0.90 > 10.163.0.2: ICMP 10.163.0.90 udp port 3784 unreachable, length 60
13:12:44.343226 IP 10.163.0.90.49152 > 10.163.0.2.4784: UDP, length 24
13:12:44.343254 IP 10.163.0.2 > 10.163.0.90: ICMP 10.163.0.2 udp port 4784 unreachable, length 60
13:12:44.689832 IP 10.163.0.90.49155 > 10.163.0.2.3784: BFDv1, Control, State Down, Flags: [none], length: 24
13:12:44.993174 IP 10.163.0.2.64855 > 10.163.0.90.3784: BFDv1, Control, State Init, Flags: [none], length: 24
13:12:44.993488 IP 10.163.0.90 > 10.163.0.2: ICMP 10.163.0.90 udp port 3784 unreachable, length 60
13:12:45.333270 IP 10.163.0.90.49152 > 10.163.0.2.4784: UDP, length 24
13:12:45.333316 IP 10.163.0.2 > 10.163.0.90: ICMP 10.163.0.2 udp port 4784 unreachable, length 60
13:12:45.599835 IP 10.163.0.90.49155 > 10.163.0.2.3784: BFDv1, Control, State Down, Flags: [none], length: 24
13:12:45.876533 IP 10.163.0.2.64855 > 10.163.0.90.3784: BFDv1, Control, State Init, Flags: [none], length: 24
13:12:45.876996 IP 10.163.0.90 > 10.163.0.2: ICMP 10.163.0.90 udp port 3784 unreachable, length 60
13:12:46.093331 IP 10.163.0.90.49152 > 10.163.0.2.4784: UDP, length 24

Expected behavior A clear and concise description of what you expected to happen.

Versions

This is a git build of frr-7.1-dev-721-g364af5f Associated branch(es): local:dev/7.2 github/FRRouting/frr.git/dev/7.2

Additional context Add any other context about the problem here.


Edited by @rzalamena : changed single "`" (back tick) with "```" (three back ticks) to fix configuration/output display.

rzalamena commented 4 years ago

I can confirm the issue, it seems we need to backport some of the VRF fixes from @pguibert6WIND . I'll try to include them in PR #5149 , otherwise feel free to open another PR with the fixes (if I take too long).

rzalamena commented 4 years ago

I just tried 7.2 with #4564 commits and it doesn't fix the issue. The topology tests works normally, the problem seems to happen when you manually type the configurations.

We have to investigate this issue a bit more.

rzalamena commented 4 years ago

@dineshkumarkamalakannan I managed to fix my problem by setting the following sysctl:

sysctl net.ipv4.udp_l3mdev_accept=1

I would also check for net.ipv4.ip_forward = 1.

Does this helps?

dineshkumarkamalakannan commented 4 years ago

@rzalamena it works with the above workaround thats a lot.

rzalamena commented 4 years ago

I've got some useful clues from @louberger in today's meeting: we need to set that sysctl in two conditions:

  1. When the daemon doesn't create sockets BINDed to VRFs (this is bfdd case which I'm fixing)
  2. When the kernel version is between 4.14 and 4.18 (there is a bug and upgrading might help)

Enabling that sysctl causes VRF sockets to receive packets from any VRF, so it is a potential security issue. Disabling it only allow sockets to receive packets from the VRFs they are binded to.

Resources for better understanding:

rzalamena commented 4 years ago

I produced a branch with the fix, however it still doesn't work without that sysctl. I tried it with kernels version 4.15 and 5.0 (linux-generic and linux-generic-hwe respectively from Ubuntu 18.04.3).

Here is the link for those who want to try: https://github.com/opensourcerouting/frr/commits/72-bfdd-vrf-socket

mjstapp commented 4 years ago

Hi @rzalamena - just a possible data point: I've seen the vrf binding problem with kernel 5.0; we had some conversation about it in the context of some of the vrf topotests. so both 4.15 and 5.0 might be examples of versions that need the extra sysctl. can you try with the 4.18 kernels that have been around - I thought my ubuntu 18 and 19 vms were offering them?