Closed matofeder closed 3 weeks ago
The approach to utilizing a unified FRR mgmt interface and eliminating SONiC's split configuration has several main issues:
1. Available Edge-core SONiC images (SONiC.202211, SONiC.202111, SONiC.202012, SONiC.202006, SONiC.201911) do not include a bug fix #13109 for frrcfgd.
This fix is available only in the Edgecore SONiC branches: 202305, 202311, 202311.0, 202311.X, master, and pre_202305.
Without this fix, the FRR management framework does not behave as expected. Specifically, frrcfgd fails to interpret the Config DB BGP entries correctly, leading to errors such as:
Sep 11 09:53:08.188862 st01-sw1g-r01-u42 INFO bgp#frrcfgd: value for table BGP_PEER_GROUP prefix default key LEAF changed to {'admin_status': (true, ADD), 'asn': (65501, ADD), 'peer_type': (external, ADD)}
Sep 11 09:53:08.190100 st01-sw1g-r01-u42 DEBUG bgp#frrcfgd: execute command vtysh -c 'configure terminal' -c 'router bgp 65000 vrf default' -c 'neighbor LEAF remote-as 65501' for table BGP_PEER_GROUP.
Sep 11 09:53:08.190100 st01-sw1g-r01-u42 DEBUG bgp#frrcfgd: VTYSH CMD: configure terminal daemons: ['bgpd']
Sep 11 09:53:08.190100 st01-sw1g-r01-u42 DEBUG bgp#frrcfgd: VTYSH CMD: router bgp 65000 vrf default daemons: ['bgpd']
Sep 11 09:53:08.190100 st01-sw1g-r01-u42 DEBUG bgp#frrcfgd: VTYSH CMD: neighbor LEAF remote-as 65501 daemons: ['bgpd']
Sep 11 09:53:08.190132 st01-sw1g-r01-u42 DEBUG bgp#frrcfgd: [bgpd] command return code: 13
Sep 11 09:53:08.190132 st01-sw1g-r01-u42 DEBUG bgp#frrcfgd: % Create the peer-group or interface first
Sep 11 09:53:08.190147 st01-sw1g-r01-u42 DEBUG bgp#frrcfgd: VTYSH CMD: end daemons: ['bgpd']
Sep 11 09:53:08.190174 st01-sw1g-r01-u42 ERR bgp#frrcfgd: command execution failure. Command: "vtysh -c 'configure terminal' -c 'router bgp 65000 vrf default' -c 'neighbor LEAF remote-as 65501'"
Sep 11 09:53:08.190174 st01-sw1g-r01-u42 ERR bgp#frrcfgd: failed running FRR command: neighbor LEAF remote-as 65501
In this case, frrcfgd recognizes the BGP_PEER_GROUP, but it fails to translate it into proper FRR-BGP CLI commands.
After manually applying the fix to the SONiC.202211 build, the FRR management framework functioned as expected, enabling the unified FRR configuration to work properly.
What should we do next with edge-core SONiC images?
An edge-core ticket has been opened to ask whether some builds that contain fix (202305, 202311, 202311.0, 202311.X, master, and pre_202305) are available to download somewhere. (https://support.edge-core.com/hc/en-us/requests/38305 - probably accessible only from my edge-core account)
Alternatively, we could build edge-core SONiC image on our own
2. FRR mgmt interface (frrcfgd) does not support all FRR configuration options
a) The FRR route map option set src ADDRESS can be configured using FRR's vtysh but appears to be missing in the unified FRR management interface.
- BGP routing works without route-map set src option, but the router is not able to reach remote addresses due to missing source IP address in local routes (more than suboptimal). See details [here]( https://github.com/SovereignCloudStack/hardware-landscape/pull/43/files).
b) show ip interface
and show ipv6 interface
commands do not work, because of mandatory local_addr in SONiC config DB. See details here.
After investigating whether FRR unified management is compatible with the required BGP configuration of the SCS hardware landscape and whether it would be suitable for configuring L3 BGP underlay networking with features like BGP unnumbered on Edge-Core enterprise SONiC I can conclude the following:
The known issues described above, 2a and 2b, affect both Sonic Edge-Core and the SONiC community. Issue 1 affects only enterprise SONiC Edge-Core.
For enterprise SONiC Edge-Core, issue 1 is blocking, meaning we have to wait until Edge-Core releases a SONiC build with the fix or build it ourselves.
The two other issues, 2a and 2b impact both, IMO these are not blocking but introduce unexpected system behavior, which could significantly reduce the ability to debug potential further issues and complicate overall network maintenance.
Based on the above, my recommendation is not to use FRR SONiC unified configuration, for now, and instead focus on contributing upstream so that this becomes feasible soon (though it’s uncertain how or whether we can influence the enterprise SONiC distribution).
@scoopex fyi
Test with Stordis SONiC release 4.4.0 (sonic-broadcom-enterprise-base-4-4-0.bin)
1. TL;DR: frrcfgd works (bug #13109 seems to be not an issue)
It seems that Stordis SONiC release 4.4.0 does not include bug fix #13109
$ docker exec -it bgp sh -c "cat /usr/local/lib/python3.9/dist-packages/frrcfgd/frrcfgd.py | grep listen_thread | wc -l"
0
But the frrcfgd.py code is different from the community and Ende-core version. The Community and Ende-core frrcfgd.py script contains 3832LOC and the Stordis one contains 5553LOC (so evidently some logic has been added to the Stordis version)
Test with the following FRR unified config :
{
"BGP_GLOBALS": {
"default": {
"local_asn": "65000",
"router_id": "10.0.1.2"
}
},
"BGP_PEER_GROUP": {
"default|LEAF": {
"admin_status": "true",
"asn": "65501",
"peer_type": "external"
}
}
}
Apply it:
$ config load frr.conf -y
Running command: /usr/local/bin/db_migrator.py -o check_version -f frr.conf
Running command: /usr/local/bin/sonic-cfggen -j frr.conf --write-to-db
Check the result:
$ show runningconfiguration bgp
Building configuration...
Current configuration:
!
frr version 8.2.2
frr defaults traditional
hostname st01-sw1g-r01-u42
log syslog informational
log facility local4
agentx
service integrated-vtysh-config
!
password zebra
enable password zebra
!
router bgp 65000
bgp router-id 10.0.1.2
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
neighbor LEAF peer-group
neighbor LEAF remote-as external
!
address-family ipv4 unicast
maximum-paths 1
maximum-paths ibgp 1
exit-address-family
!
address-family ipv6 unicast
maximum-paths 1
maximum-paths ibgp 1
exit-address-family
exit
!
end
It works!
2a. It seems that Stordis SONiC release 4.4.0 contains some set srt
directives:
$ docker exec -it bgp sh -c "cat /usr/local/lib/python3.9/dist-packages/frrcfgd/frrcfgd.py | grep 'set src'"
cmds = ["vtysh -c 'configure terminal' -c 'route-map %s permit 10' -c 'set src %s'" % (rm_name, addr_list[0]),
but, the route_map_key_map
does not include set src
option (at least in the main frrcfgd.py script).
Maybe there is some magic how to configure the route map set src option, but from the source code is not clear how.
@scoopex are you aware of any documentation of frrcfgd for Stordis SONiC?
2b. Test of show ip interface
and show ipv6 interface
commands shows that they work with Stordis SONiC as expected
See the following FRR config:
{
"BGP_GLOBALS": {
"default": {
"local_asn": "65000",
"router_id": "10.0.1.2"
}
},
"BGP_PEER_GROUP": {
"default|LEAF": {
"admin_status": "true",
"asn": "65501",
"peer_type": "external"
}
},
"BGP_NEIGHBOR": {
"default|Ethernet32": {
"peer_group_name": "LEAF"
}
}
}
The community and Engecore SONiC failed with the error described here. But the Stordis SONiC works like a charm.
It appears that Stordis SONiC is in better shape compared to the Community or Edge-core versions, particularly in terms of supporting the FRR unified configuration. Issues 1 and 2b don't appear to apply to Stordis SONiC. Issue 2a may or may not apply, so having documentation for Stordis SONiC would be helpful.
Following our investigation (refer to https://github.com/SovereignCloudStack/issues/issues/719#issuecomment-2349084594), we aimed to contribute upstream to address the issues preventing us from utilizing integrated FRR configuration.
Our upstream contributions are detailed in https://github.com/SovereignCloudStack/sonic-buildimage/pull/4, where we have ported fixes that should enable a functional integrated FRR configuration and more.
SONiC image has been built using the above branch (https://github.com/SovereignCloudStack/sonic-buildimage/pull/4) and successfully tested on the SCS LAB environment, see https://github.com/SovereignCloudStack/hardware-landscape/pull/56
SONiC currently uses a split-mode like configuration where SONiC settings are managed in config_db.json, and routing is configured separately in frr.conf.
This approach often falls short due to the limitations of the default bgpcfgd configuration tool, leading to the need for split configurations.
The
frr-mgmt-framework
offers a more comprehensive solution by supporting BGP, OSPF, STATIC, IGMP, PIM, VRFs, and BGP EVPN within a single integrated configuration mechanism.To enable this, add "frr_mgmt_framework_config": "true" to DEVICE_METADATA.
Clarify whether the
frr-mgmt-framework
supports all necessary configuration options required to achieve the desired L3 underlay configuration for SCS.