FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.27k stars 1.24k forks source link

Is OSPF route delete logging incomplete or am I missing it in another debug log? #16847

Open Cerebus opened 2 days ago

Cerebus commented 2 days ago

Discussed in https://github.com/FRRouting/frr/discussions/16760

Originally posted by **Cerebus** September 6, 2024 Version 9.1 currently but I can upgrade if necessary. I have `debug ospf zebra` enabled b/c I want to see changes to the routing table. When a distant link that's part of an ECMP set dies and OSPF processes the LSA, it does not log the deletion of the ECMP route. E.g., given the starting point: ``` 192.168.0.6/31 nhid 33 via 192.168.0.1 dev eth1 proto ospf metric 20 192.168.0.12/31 nhid 34 proto ospf metric 20 nexthop via 192.168.0.3 dev eth2 weight 1 nexthop via 192.168.0.1 dev eth1 weight 1 ``` The `0.6/31` link is traversed by `0.12/31` destinations over the eth1 link. If that distant link dies, OSPF logs: ``` 2024/09/06 20:33:41.726 OSPF: [GEBDT-G1CP1] Zebra: Route delete 192.168.0.6/31 2024/09/06 20:33:41.726 OSPF: [RDXQE-Z6A3B] Zebra: Route add 192.168.0.12/31 nexthop 192.168.0.3, ifindex=115 eth2 ``` And the resulting table: ``` 192.168.0.12/31 nhid 26 via 192.168.0.3 dev eth2 proto ospf metric 20 ``` The resulting table is correct, but the log omits the deletion of `0.12/31 nexthop via eth1`. Is this info in another debug log, or is this a bug?
donaldsharp commented 2 days ago

The code is telling you the route is being replaced w/ 1 nexthop.

Cerebus commented 2 days ago

You're saying the delete is implicit, but if the link is later restored, both are logged:

2024/09/06 20:40:33.211 OSPF: [RDXQE-Z6A3B] Zebra: Route add 192.168.0.12/31 nexthop 192.168.0.1, ifindex=96 eth1
2024/09/06 20:40:33.211 OSPF: [RDXQE-Z6A3B] Zebra: Route add 192.168.0.12/31 nexthop 192.168.0.3, ifindex=115 eth2

In this case the nexthop is cumulative, not a replacement. If there's an implicit route delete in one case but not in the other, how would an observer know the difference?

ETA: I omitted the route additions for the 0.6/31 prefix as they're not really relevant.

mjstapp commented 1 day ago

ospf debugs each nexthop one-by-one - that's just ... how that particular debug is written

donaldsharp commented 1 day ago

There is no delete on a route replacement. I believe that is your confusion. All the code is saying replace the route with this new one and here is the list of nexthops I am sending down.

Cerebus commented 1 day ago

I'm tracking, but the confusion is in the logging, not in me.

Say, frex, that I'm parsing the events and reconstructing the FIB. On any given Route add event, I don't know if this event is a replacement or an addition. These are very different events. I can't assume one or the other b/c the resulting FIB state will be wrong. The router has it correct, but I can't determine that from the logging.

If I have to look into the future to determine if a given Route add should be a replacement -- e.g., looking for a block of closely-timed Route add for the same prefix and treating the set as a replacement -- then the logging is at fault. Logs should fully reflect the events taking place at the moment of the log.

donaldsharp commented 1 day ago

Mark is correct that is just how these debugs are written, nor am I going to sanction a change in debug logs to stuff a bunch of nexthops on one line. Debug logs are meant to be readable by a human. I disagree with the notion that all debugs should reflect a single event as well, debug logs are meant to help the developer debug the code when something has gone wrong, that is it. Informational/Warning and Error I would agree with you, though. We also MAKE no guarantee about debug logs. They can and do change from release to release as we improve things. Hell one of the major problems with debugs that I am still fixing are debugs that don't include the vrf in them( thanks for pointing this out to me for these debugs, I'll address that).

In any event reading debug logs is not a way that I would ever recommend for doing what I think you are trying to do, although in all honesty I am not really sure what you are trying to do so I cannot really point you in a direction. ospfd has a way to peak into it's database. Zebra has some basic scripting with lua that has a way to hook into rib processing. What is the actual end goal of your problem?