FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.21k stars 1.24k forks source link

MPLS L3VPN no Route Install #13853

Open Per-Forma opened 1 year ago

Per-Forma commented 1 year ago

Describe the bug

To Reproduce

Configure OSPF with segment routing for label distribution, and ibgp in the default VRF. Enable ipv4 VPN address family and disable unicast. Create VRF 'inet' on frr-boc and internal routers, placing an interface in the vrf on each router. Configure route-distinguisher and route-targe to import/export '394211:42' to the inet vrf. See network diagram in screenshots.

When testing, the prefix (example: 147.0.0.0/30) (connected to frr-boc) is being advertised by frr-boc to hosts 5171 and 3928 with the 394211:42 route distinguisher and it is being installed to those hosts. A prefix originating on the 5171 or 3928 (example: 171.0.0.0/24) and being advertised to frr-boc is not getting installed to the routing table. When viewing the output of show bgp ipv4 vpn rd 394211:42 on frr-boc the prefixs appear and indicate valid and best. However, they do not appear to be getting installed into the forwarding table. See the screenshot for example.

I've included the config of frr-boc below as a txt file. Of note, the 147.0.0.0/30 is being picked up by the 5171 and 3928 and being installed into the correct vrf table. Additionally, the two prefixes being advertised by the 5171 and 3928 are being installed into one-another's inet vrf table.

Expected behavior

Expected behavior would be that the route installs correctly or an error/reason is indicated for it not being installed.

Screenshots

image

image

image

Versions

Additional context

I'm building this in a lab environment with the original intention of using the current master branch to see if I can replicate an issue we are seeing in a production environment for ipv6 routes in the vrf being rejected. However, I'm stuck here, unable to see what I have done in this lab that's causing these ipv4 routes be unable to install into the VRF table. This issue is different from the v6 rejected issue as I'm not seeing the rejected message. I'm unsure if this is something I've done wrong or not!

FRR-BOC lo: 192.168.62.41 5171 lb1: 192.168.62.1 3928 lb1: 192.168.62.2

frr-boc-config.txt

riw777 commented 1 year ago

I wonder if this is related to disabling unicast ... can you try without that step?

Per-Forma commented 1 year ago

Apologies for not getting back to you sooner on this. I had to take down my lab machines and move them so I got disrupted a bit.

I just removed the unicast disable command and then restarted the frr service.

No change in behavior so far. Prefixes show up when entering neighbor 192.168.62.1 activate but not installed to the inet routing table. I don't know if it matters to your thinking process here, but although the unicast safi is enable it's not established in bgp as I don't have it turned on in the 5171 neighbor.

riw777 commented 10 months ago

this looks like a bug ... clearing myself off in case someone wants to work on a fix

beith12 commented 10 months ago

Can we see the following output from frr-boc please?

#show bgp ipv4 vpn detail-routes

I have seen something similar to this and it was due to the RTs changing when BGP came up - we solved it by manually applying a unique router-id on the BGP vrf process (router bgp 394211 vrf inet).

Per-Forma commented 9 months ago

Sure thing! Here it is:

frr-boc# sh bgp ipv4 vpn detail-routes
BGP table version is 12, local router ID is 192.168.62.41, vrf id 0
Default local pref 100, local AS 394211
Route Distinguisher: 394211:42
BGP routing table entry for 394211:42:147.0.0.0/30, version 11
not allocated
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  192.168.62.1
  11599
    147.0.0.1 from 0.0.0.0 (192.168.62.41) vrf inet(5) announce-nh-self
      Origin incomplete, metric 0, valid, sourced, local, best (First path received)
      Extended Community: RT:394211:42
      Originator: 192.168.62.41
      Remote label: 80
      Last update: Tue Nov 28 01:09:43 2023
BGP routing table entry for 394211:42:150.0.0.1/32, version 12
not allocated
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  192.168.62.1
  11599
    147.0.0.1 from 0.0.0.0 (192.168.62.41) vrf inet(5) announce-nh-self
      Origin incomplete, metric 0, valid, sourced, local, best (First path received)
      Extended Community: RT:394211:42
      Originator: 192.168.62.41
      Remote label: 80
      Last update: Tue Nov 28 01:09:43 2023
BGP routing table entry for 394211:42:158.0.0.0/24, version 1
not allocated
Paths: (1 available, best #1)
  Not advertised to any peer
  Local
    192.168.62.1 (metric 110) from 192.168.62.1 (192.168.62.1)
      Origin incomplete, localpref 100, valid, internal, best (First path received)
      Extended Community: RT:394211:42
      Remote label: 68001
      Last update: Mon Nov 27 23:38:44 2023
BGP routing table entry for 394211:42:171.0.0.0/24, version 2
not allocated
Paths: (1 available, best #1)
  Not advertised to any peer
  Local
    192.168.62.2 (metric 120) from 192.168.62.1 (192.168.62.2)
      Origin incomplete, localpref 100, valid, internal, best (First path received)
      Extended Community: RT:394211:42
      Originator: 192.168.62.2, Cluster list: 192.168.62.1
      Remote label: 68008
      Last update: Mon Nov 27 23:38:44 2023

Displayed  4 routes and 4 total paths

For some reason, I'm not seeing notifications from github or I would have responded sooner. I'll see if I can do something about that so I can respond a little quicker.

beith12 commented 9 months ago

@Per-Forma thanks, from frr-boc can we see out from the following also:

#show run bgp #show ip fib vrf inet

Per-Forma commented 9 months ago

Here they are

frr-boc# sh run bgp
Building configuration...

Current configuration:
!
frr version 9.1-dev
frr defaults traditional
hostname frr-boc
log file /var/log/frr/bgpd.log
log syslog informational
service integrated-vtysh-config
!
router bgp 394211 vrf inet
 neighbor 147.0.0.1 remote-as 11599
 !
 address-family ipv4 unicast
  neighbor 147.0.0.1 soft-reconfiguration inbound
  neighbor 147.0.0.1 route-map bgp-allow-all-map in
  neighbor 147.0.0.1 route-map bgp-allow-all-map out
  label vpn export auto
  rd vpn export 394211:42
  rt vpn both 394211:42
  export vpn
  import vpn
 exit-address-family
exit
!
router bgp 394211
 bgp router-id 192.168.62.41
 neighbor 192.168.62.1 remote-as 394211
 neighbor 192.168.62.1 update-source lo
 !
 address-family ipv4 vpn
  neighbor 192.168.62.1 activate
  neighbor 192.168.62.1 soft-reconfiguration inbound
 exit-address-family
exit
!
ip prefix-list all-v4 seq 5 permit any
!
route-map bgp-allow-all-map permit 5
 match ip address prefix-list all-v4
exit
!
end
frr-boc# sh ip fib vrf inet
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF inet:
C>* 147.0.0.0/30 is directly connected, enp2s0, 16:46:39
B>* 150.0.0.1/32 [20/0] via 147.0.0.1, enp2s0, weight 1, 15:14:39
beith12 commented 9 months ago

@Per-Forma thanks for the output - i cant see anything obvious from it, the only thing I can think to try is manually setting the RID on the VRF AF itself using the following (write & restart FRR after):

router bgp 394211 vrf inet bgp router-id x.x.x.x

Per-Forma commented 9 months ago

Hey @beith12, I thanks for looking at this with me. I set the RID in the bgp vrf stanza as you indicated. It's now set to bgp router-id 147.0.0.2 but that doesn't appear to have improved anything yet.

beith12 commented 9 months ago

@Per-Forma OK - are any of the working devices with a VRF in this topology running 9.1-dev? If so it might be worth comparing everything side-by-side. I see that you mentioned you are trying to get IPv6 in an MPLS working - I assume (based on your configs) this over an IPv4 underlay? If so i have been testing earlier releases for this feature and have yet to achieve end-to-end.

beith12 commented 9 months ago

@Per-Forma Should also mention that 9.1 was released today so might be worth trying that rather than dev version.

Per-Forma commented 9 months ago

@beith12 - for reference. This environment is in a lab topology. I initially configured it to test a 6VPE (IPv6 VPN over IPv4 MPLS backbone). We have a version of this in out production environment, and the IPv4 vrf routes are working correctly there. Our production envronment is running the frr 8.1 package that is packaged by canonical. In that environment, I'm seeing an issue with the v6 routes in the VRF. My original intention here was to build frr from source, replicate the issue I'm seeing in production and then work on resolving it. However, I've been stuck on this IPv4 route problem, which is working fine in production.

In this lab environment, I am running a 9.1 dev version. I linked the commit in the original issue, which is from June. I haven't changed that, as I know how frustrating it can be when things get changed during a troubleshooting exercise!

I can work on building a new version and testing this on it if you think that would be the best use of time.

beith12 commented 9 months ago

@Per-Forma I would rebuild (the faulty node at least) with the 9.1 version that was released yesterday to compare.

Thanks

Per-Forma commented 9 months ago

Hey @beith12, got this done, but I'm seeing much the same results. However, I did notice a couple of odd warnings in the status output for the frr.service. See below. I notice that there are two of these, which matches the number of routes in this test environment that I'm missing. Given that the message references the SRGB, I think this must be related. The part I don't understand is that the SID index numbers being given don't match up with a valid mpls label as they are out of range for a valid label.

Dec 04 19:45:36 frr-boc frrinit.sh[9698]:  * Started watchfrr
Dec 04 19:45:36 frr-boc watchfrr[9709]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Dec 04 19:45:36 frr-boc systemd[1]: Started FRRouting.
Dec 04 19:45:37 frr-boc zebra[9722]: [V98V0-MTWPF] client 54 says hello and bids fair to announce only bgp routes vrf=0
Dec 04 19:45:48 frr-boc ospfd[9737]: [S5PCG-77H23] Packet[DD]: Neighbor 192.168.62.1 Negotiation done (Master).
Dec 04 19:45:48 frr-boc ospfd[9737]: [XYQCD-TPKQT][EC 134217736] index2label: SID index 7168000 falls outside SRGB range
Dec 04 19:45:48 frr-boc ospfd[9737]: [G51Y1-54QJR][EC 134217744] Type-10 Opaque-LSA (opaque_type=8): Common origination for AREA(0.0.0.0) has already started
Dec 04 19:45:50 frr-boc ospfd[9737]: [XYQCD-TPKQT][EC 134217736] index2label: SID index 7168256 falls outside SRGB range
Dec 04 19:45:51 frr-boc zebra[9722]: [WPPMZ-G9797] if_zebra_speed_update: enp2s0 old speed: 0 new speed: 1000
Dec 04 19:45:59 frr-boc bgpd[9730]: [JG0WZ-7X009][EC 33554504] 192.168.62.1 unrecognized capability code: 128 - ignored
beith12 commented 9 months ago

@Per-Forma This may relate to the next hop of the BGP route (announced by OSPF). Have you set the following anywhere segment-routing global-block xxxxxxxx? If so this label range it is best to match on all devices. The SID index (defined by segment-routing prefix x.x.x.x/32 index is added to the SRGB to create the transport label so double check the index is not a large number and the following is set high enough in Linux net.mpls.platform_labels (use sudo sysctl -a --pattern mpls to see the labels)