FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.38k stars 1.26k forks source link

BGP PIC HLD Review #14703

Open eddieruan-alibaba opened 1 year ago

eddieruan-alibaba commented 1 year ago

Alibaba team is working with other folks in SONiC routing WG and wants to bring in BGP PIC support to FRR and SONiC. The current HLD is documented in https://github.com/eddieruan-alibaba/SONiC/blob/eruan-pic/doc/bgp_pic/bgp_pic.md and the SONiC PR for tracking the discussions for this HLD is at https://github.com/sonic-net/SONiC/pull/1493

Based on the suggestion from today's FRR weekly meeting, I open this issue to track the discussion on this HLD in FRR. I would like to get the consensus on this approach for enabling BGP PIC in FRR.

I add some background about this feature here since I was in the group of people to enable this feature almost 20 years ago. This feature was first introduced to Cisco GSR IOS platform back to 2006 / 2007 ish. The main target was to reduce traffic loss when a remote BGP PE went down. The recursive route was supported in Cisco OS long before this feature. The design idea was to reorganize forwarding objects. So we could reduce the number of forwarding objects' updates from O(N) to O(1) which would stoping traffic loss first. This feature later became a standard RIB/FIB approach in Cisco IOS and IOX platforms. The draft was raised first in 2016 by my Cisco friends, which gave a clear description on how to handle these forwarding objects for achieving PIC.

mjstapp commented 1 year ago

In the discussions we've had, I've asked several questions about what's been proposed here; I'll try to capture them concisely.

First, it seems (if I'm following what sonic is doing) that you are concerned about just one specific situation: the case where multiple nexthops are being used by some routes, and one of those nexthops stops being useful. Sonic wants to react to that event quickly, using only the viable remaining links' nexthops.

What exactly are the triggers for that event? Is BFD in use, for example? Or is an IGP change or a BGP TCP session failure the trigger? Or is there a failure of a local link?

It also seems that sonic is able to take the flattened list of nexthops that it learns about from FRR's FPM messages and render that list into the two-part division that its hardware supports. If you have the group of nexthops, and you learn that one of them is not useful, why not just ... react, using the hardware you have available to you?

What I don't follow is why re-designing the internals of zebra's RIB processing data structures is the right solution. What alternatives to that sort of overhaul might be viable? Is there a missing event that sonic would like to have delivered through FPM, for example?

pguibert6WIND commented 1 year ago

My concern is the limitations brought by the design:

Is there a proposal (an extension to your current design) to have a compatible design to cover

I think you talked about a separate team that would handle the IGP failures implementation. I think this would be great to have the design proposed by this team.

note: the link to the proposal I made: https://docs.google.com/presentation/d/11MGkW7MmAXl-dE9G28ezNCYvvsISB1zrfQzWSAoKBBE/edit#slide=id.p

eddieruan-alibaba commented 1 year ago

In the discussions we've had, I've asked several questions about what's been proposed here; I'll try to capture them concisely.

First, it seems (if I'm following what sonic is doing) that you are concerned about just one specific situation: the case where multiple nexthops are being used by some routes, and one of those nexthops stops being useful. Sonic wants to react to that event quickly, using only the viable remaining links' nexthops.

EDDIE: PIC has two enhancements described in RFC. One is "It prevents BGP load balancing updates from being triggered by IGP load balancing updates", which is basically recursive route support. The other is to achieve fast convergence in the event of a hardware forwarding failure related to a remote BGP PE becoming unreachable. This is what we mainly discuss here and derive to change the data structure organization. When a remote BGP PE goes down, we want to minimize the traffic loss. This was required by DT back to 2006 on Cisco GSR platform. It is a basic SP feature now.

What exactly are the triggers for that event? Is BFD in use, for example? Or is an IGP change or a BGP TCP session failure the trigger? Or is there a failure of a local link?

EDDIE: normally it would be triggered via IGP detect BGP remote PE is down. It could be triggered via multi hop BFD since remote PE is normally far away.

It also seems that sonic is able to take the flattened list of nexthops that it learns about from FRR's FPM messages and render that list into the two-part division that its hardware supports. If you have the group of nexthops, and you learn that one of them is not useful, why not just ... react, using the hardware you have available to you?

EDDIE: fpm update is triggered from zebra when IGP detects a remote PE is not reachable. With current zebra data structure, zebra needs to trigger an update to all related NHG, which would be an O(N) operation. That's the main reason we want to change this behavior in zebra to O(1) operation.

What I don't follow is why re-designing the internals of zebra's RIB processing data structures is the right solution. What alternatives to that sort of overhaul might be viable? Is there a missing event that sonic would like to have delivered through FPM, for example?

EDDIE: The main reason is to let zebra provide O(1) update instead of O(N) update via fpm. This way, bottom subscribers could handle the event properly. By the way, the routing decision logic is not changed, the new pic_nhe is derived only when the routing decision is made.

I don't know if any Cisco friends wants to chime in here.

eddieruan-alibaba commented 1 year ago

My concern is the limitations brought by the design:

  • this design only deals with FPM and a two-part division that the logical entity behind FPM (hardware) is
  • the design only covers SRv6 case : note that in SRv6, you dont' have IGP label.

EDDIE: current design could be applied to MPLS VPN, EVPN as well. The same approach could be used as is since pic_nhe is the forwarding only portion. From testing point of view, we will only test SRv6 VPN case.

For IGP label handling for PIC, current approach is applicable, as long as we decide to apply IGP PIC to it. But from my point of view, the IGP PIC's value is limited.

Is there a proposal (an extension to your current design) to have a compatible design to cover

  • the kernel support EDDIE: The main missing piece is kernel data plane support. Most FRR changes could be applied if kernel could support PIC. We just need to adjust the message once new netlink message is defined for PIC.

  • the MPLS case EDDIE: MPLS VPN case would be supported as is. Only FPM part needs to be updated based on MPLS VPN's message format.

I think you talked about a separate team that would handle the IGP failures implementation. I think this would be great to have the design proposed by this team.

EDDIE: local link down event would be down independent from PIC. There is a separate HLD would be raised.

note: the link to the proposal I made: https://docs.google.com/presentation/d/11MGkW7MmAXl-dE9G28ezNCYvvsISB1zrfQzWSAoKBBE/edit#slide=id.p EDDIE: as I mentioned in the meeting, what you proposed is not for PIC, since you didn't plan to change kernel forwarding.

riw777 commented 10 months ago

It looks like the HLD doc above is no longer available ... did it move someplace?

eddieruan-alibaba commented 9 months ago

It looks like the HLD doc above is no longer available ... did it move someplace?

Sorry. This HLD was moved under doc/pic recently due to we were consolidate all PIC related tasks. The new link is at

https://github.com/eddieruan-alibaba/SONiC/blob/eruan-pic/doc/pic/bgp_pic_edge.md

We have PIC architecture doc published at the same location. https://github.com/sonic-net/SONiC/blob/master/doc/pic/bgp_pic_arch_doc.md

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

frrbot[bot] commented 2 months ago

This issue will be automatically closed in the specified period unless there is further activity.

eddieruan-alibaba commented 2 months ago

Keep it active,

frrbot[bot] commented 2 months ago

This issue will no longer be automatically closed.

eddieruan-alibaba commented 2 months ago

This would be part of SONiC Routing Working Group's Phoenix Wing Initiative. The detail information could be found at

http://phoenixwing.com.cn/.

Code PR will be raised soon.