Open sysoleg opened 12 months ago
@riw777 any PRs you know of?
The problem is still reproducible in the latest master. Before running the script (see issue description above):
# ip next show dev dum0
id 7 dev dum0 scope link proto zebra
id 12 via 192.168.0.2 dev dum0 scope link proto zebra
# vtysh -c 'show nexthop rib 7'
ID: 7 (zebra)
RefCnt: 2
Uptime: 00:00:47
VRF: default
Valid, Installed
Interface Index: 9
is directly connected, dum0 (vrf default)
# vtysh -c 'show nexthop rib 12'
ID: 12 (zebra)
RefCnt: 1
Uptime: 00:00:47
VRF: default
Valid, Installed
Interface Index: 9
via 192.168.0.2, dum0 (vrf default), weight 1
# ip ro show dev dum0
192.168.0.0/30 proto kernel scope link src 192.168.0.1
192.168.55.55 nhid 12 via 192.168.0.2 proto static metric 20
# vtysh -c 'show ip route'
<skip>
C>* 192.168.0.0/30 is directly connected, dum0, 00:00:09
S>* 192.168.55.55/32 [1/0] via 192.168.0.2, dum0, weight 1, 00:00:08
After running the script:
# ip next show dev dum0
<empty output>
# vtysh -c 'show nexthop rib'
ID: 16 (zebra)
RefCnt: 1
Uptime: 00:00:15
VRF: default
Valid
Interface Index: 9
via 192.168.0.2, dum0 (vrf default), weight 1
<skip>
ID: 14 (zebra)
RefCnt: 2
Uptime: 00:00:15
VRF: default
Valid
Interface Index: 9
is directly connected, dum0 (vrf default)
# ip ro show dev dum0
192.168.0.0/30 proto kernel scope link src 192.168.0.1
# vtysh -c 'show ip route'
<skip>
C>* 192.168.0.0/30 is directly connected, dum0, 00:00:10
S>r 192.168.55.55/32 [1/0] via 192.168.0.2, dum0, weight 1, 00:00:10
Note the absence of Installed
in the nexthop entry and the "r" (ROUTE_ENTRY_FAILED
) flag for static route.
Relevant log messages:
Nov 22 20:06:59 cl1 zebra[160819]: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID (14): directly connected, dum0(9) vrf default(0)
Nov 22 20:06:59 cl1 zebra[160819]: [R43C6-KYHWT] netlink_nexthop_msg_encode: RTM_NEWNEXTHOP, id=14
Nov 22 20:06:59 cl1 zebra[160819]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=40, msg cnt=1
Nov 22 20:06:59 cl1 zebra[160819]: [TJ327-ET8HE] netlink_send_msg: >> netlink message dump [sent]
Nov 22 20:06:59 cl1 zebra[160819]: [JAS4D-NCWGP] nlmsghdr [len=40 type=(104) NEWNEXTHOP flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=14 pid=3231857309]
Nov 22 20:06:59 cl1 zebra[160819]: [WCX94-SW894] nhm [family=(2) AF_INET scope=(0) UNIVERSE protocol=(11) ZEBRA flags=0x00000000 {}]
Nov 22 20:06:59 cl1 zebra[160819]: [KFBSR-XYJV1] rta [len=8 (payload=4) type=(1) ID]
Nov 22 20:06:59 cl1 zebra[160819]: [Z4E9C-GD9EP] 14
Nov 22 20:06:59 cl1 zebra[160819]: [KFBSR-XYJV1] rta [len=8 (payload=4) type=(5) OIF]
Nov 22 20:06:59 cl1 zebra[160819]: [JR4EA-BKPTA] 9
Nov 22 20:06:59 cl1 zebra[160819]: [V8KNF-8EXH8] netlink_recv_msg: << netlink message dump [recv]
Nov 22 20:06:59 cl1 zebra[160819]: [JAS4D-NCWGP] nlmsghdr [len=76 type=(2) ERROR flags=(0x0300) {DUMP,(ROOT|REPLACE|CAPPED),(MATCH|EXCLUDE|ACK_TLVS)} seq=14 pid=3231857309]
Nov 22 20:06:59 cl1 zebra[160819]: [KWP1C-6CSXF] nlmsgerr [error=(-100) Network is down]
Nov 22 20:06:59 cl1 zebra[160819]: [HSYZM-HV7HF] Extended Error: Carrier for nexthop device is down
Nov 22 20:06:59 cl1 zebra[160819]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=14, pid=3231857309
Nov 22 20:06:59 cl1 zebra[160819]: [QTT8V-3ZQ34] nl_batch_read_resp: netlink error message seq=14
Nov 22 20:06:59 cl1 zebra[160819]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (14[if 9 vrfid 0]) into the kernel
Nov 22 20:06:59 cl1 zebra[160819]: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID (16): 192.168.0.2, via dum0(9) vrf default(0)
Nov 22 20:06:59 cl1 zebra[160819]: [R43C6-KYHWT] netlink_nexthop_msg_encode: RTM_NEWNEXTHOP, id=16
Nov 22 20:06:59 cl1 zebra[160819]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=48, msg cnt=1
Nov 22 20:06:59 cl1 zebra[160819]: [TJ327-ET8HE] netlink_send_msg: >> netlink message dump [sent]
Nov 22 20:06:59 cl1 zebra[160819]: [JAS4D-NCWGP] nlmsghdr [len=48 type=(104) NEWNEXTHOP flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=16 pid=3231857309]
Nov 22 20:06:59 cl1 zebra[160819]: [WCX94-SW894] nhm [family=(2) AF_INET scope=(0) UNIVERSE protocol=(11) ZEBRA flags=0x00000000 {}]
Nov 22 20:06:59 cl1 zebra[160819]: [KFBSR-XYJV1] rta [len=8 (payload=4) type=(1) ID]
Nov 22 20:06:59 cl1 zebra[160819]: [Z4E9C-GD9EP] 16
Nov 22 20:06:59 cl1 zebra[160819]: [KFBSR-XYJV1] rta [len=8 (payload=4) type=(6) GATEWAY]
Nov 22 20:06:59 cl1 zebra[160819]: [M8QV4-KY9C0] 192.168.0.2
Nov 22 20:06:59 cl1 zebra[160819]: [KFBSR-XYJV1] rta [len=8 (payload=4) type=(5) OIF]
Nov 22 20:06:59 cl1 zebra[160819]: [JR4EA-BKPTA] 9
Nov 22 20:06:59 cl1 zebra[160819]: [V8KNF-8EXH8] netlink_recv_msg: << netlink message dump [recv]
Nov 22 20:06:59 cl1 zebra[160819]: [JAS4D-NCWGP] nlmsghdr [len=76 type=(2) ERROR flags=(0x0300) {DUMP,(ROOT|REPLACE|CAPPED),(MATCH|EXCLUDE|ACK_TLVS)} seq=16 pid=3231857309]
Nov 22 20:06:59 cl1 zebra[160819]: [KWP1C-6CSXF] nlmsgerr [error=(-100) Network is down]
Nov 22 20:06:59 cl1 zebra[160819]: [HSYZM-HV7HF] Extended Error: Carrier for nexthop device is down
Nov 22 20:06:59 cl1 zebra[160819]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=16, pid=3231857309
Nov 22 20:06:59 cl1 zebra[160819]: [QTT8V-3ZQ34] nl_batch_read_resp: netlink error message seq=16
Nov 22 20:06:59 cl1 zebra[160819]: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 192.168.55.55/32 vrf 0(254)
Nov 22 20:06:59 cl1 zebra[160819]: [J87BH-XW5PP] netlink_route_multipath_msg_encode: 192.168.55.55/32 nhg_id is 16
Nov 22 20:06:59 cl1 zebra[160819]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=52, msg cnt=1
Nov 22 20:06:59 cl1 zebra[160819]: [TJ327-ET8HE] netlink_send_msg: >> netlink message dump [sent]
Nov 22 20:06:59 cl1 zebra[160819]: [JAS4D-NCWGP] nlmsghdr [len=52 type=(24) NEWROUTE flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=17 pid=3231857309]
Nov 22 20:06:59 cl1 zebra[160819]: [GCEGC-W8YBF] rtmsg [family=(2) AF_INET dstlen=32 srclen=0 tos=0 table=254 protocol=(196) UNKNOWN scope=(0) UNIVERSE type=(1) UNICAST flags=0x0000 {}]
Nov 22 20:06:59 cl1 zebra[160819]: [KFBSR-XYJV1] rta [len=8 (payload=4) type=(1) DST]
Nov 22 20:06:59 cl1 zebra[160819]: [M8QV4-KY9C0] 192.168.55.55
Nov 22 20:06:59 cl1 zebra[160819]: [KFBSR-XYJV1] rta [len=8 (payload=4) type=(6) PRIORITY]
Nov 22 20:06:59 cl1 zebra[160819]: [Z4E9C-GD9EP] 20
Nov 22 20:06:59 cl1 zebra[160819]: [KFBSR-XYJV1] rta [len=8 (payload=4) type=(30) NH_ID]
Nov 22 20:06:59 cl1 zebra[160819]: [Z4E9C-GD9EP] 16
Nov 22 20:06:59 cl1 zebra[160819]: [V8KNF-8EXH8] netlink_recv_msg: << netlink message dump [recv]
Nov 22 20:06:59 cl1 zebra[160819]: [JAS4D-NCWGP] nlmsghdr [len=68 type=(2) ERROR flags=(0x0300) {DUMP,(ROOT|REPLACE|CAPPED),(MATCH|EXCLUDE|ACK_TLVS)} seq=17 pid=3231857309]
Nov 22 20:06:59 cl1 zebra[160819]: [KWP1C-6CSXF] nlmsgerr [error=(-22) Invalid argument]
Nov 22 20:06:59 cl1 zebra[160819]: [HSYZM-HV7HF] Extended Error: Nexthop id does not exist
Nov 22 20:06:59 cl1 zebra[160819]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=17, pid=3231857309
Nov 22 20:06:59 cl1 zebra[160819]: [QTT8V-3ZQ34] nl_batch_read_resp: netlink error message seq=17
Nov 22 20:06:59 cl1 zebra[160819]: [MFYWV-KH3MC] process_subq_early_route_add: (0:?):192.168.55.55/32: Inserting route rn 0x5628b3007fc0, re 0x5628b30088f0 (static) existing 0x0, same_count 0
Nov 22 20:06:59 cl1 zebra[160819]: [Q4T2G-E2SQF] process_subq_early_route_add: dumping RE entry 0x5628b30088f0 for 192.168.55.55/32 vrf default(0)
Nov 22 20:06:59 cl1 zebra[160819]: [M5M58-9PD2R] 192.168.55.55/32: uptime == 10393009, type == 3, instance == 0, table == 254
Nov 22 20:06:59 cl1 zebra[160819]: [RVZMM-N7DME] 192.168.55.55/32: metric == 0, mtu == 0, distance == 1, flags == Recursion RR Distance status == None
Nov 22 20:06:59 cl1 zebra[160819]: [KC40Q-0DH3A] 192.168.55.55/32: tag == 0, nexthop_num == 1, nexthop_active_num == 0
Nov 22 20:06:59 cl1 zebra[160819]: [ZSB1Z-XM2V3] 192.168.55.55/32: NH 192.168.0.2[0] vrf default(0) wgt 1, with flags
Nov 22 20:06:59 cl1 zebra[160819]: [SCETK-GQ9E4] 192.168.55.55/32: dump complete
Nov 22 20:06:59 cl1 zebra[160819]: [GCGMT-SQR82] rib_link: (0:?):192.168.55.55/32: rn 0x5628b3007fc0 adding dest
Nov 22 20:06:59 cl1 zebra[160819]: [JF0K0-DVHWH] rib_meta_queue_add: (0:254):192.168.55.55/32: queued rn 0x5628b3007fc0 into sub-queue Static Routes
Nov 22 20:06:59 cl1 zebra[160819]: [NZNZ4-7P54Y] default(0:254):192.168.55.55/32: Processing rn 0x5628b3007fc0
Nov 22 20:06:59 cl1 zebra[160819]: [ZJVZ4-XEGPF] default(0:254):192.168.55.55/32: Examine re 0x5628b30088f0 (static) status: Changed flags: Recursion RR Distance dist 1 metric 0
Nov 22 20:06:59 cl1 zebra[160819]: [JPJF4-TGCY5] default(0:254):192.168.55.55/32: After processing: old_selected 0x0 new_selected 0x5628b30088f0 old_fib 0x0 new_fib 0x5628b30088f0
Nov 22 20:06:59 cl1 zebra[160819]: [ZFRN1-EZNJZ] default(0:254):192.168.55.55/32: Adding route rn 0x5628b3007fc0, re 0x5628b30088f0 (static)
Nov 22 20:06:59 cl1 zebra[160819]: [HH6N2-PDCJS] default(0:254):192.168.55.55/32 rn 0x5628b3007fc0 dequeued from sub-queue Static Routes
Nov 22 20:06:59 cl1 zebra[160819]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (16[192.168.0.2 if 9 vrfid 0]) into the kernel
Nov 22 20:06:59 cl1 staticd[160826]: [S4MGP-4WQTA] route_notify_owner: Route 192.168.55.55/32 failed to install for table: 254
Nov 22 20:06:59 cl1 zebra[160819]: [HZ1TW-92EY6] Notifying Owner: static about prefix 192.168.55.55/32(254) 0 vrf: 0
Nov 22 20:06:59 cl1 zebra[160819]: [VYKYC-709DP] default(0:254):192.168.55.55/32: Route install failed
The problem is that the link carrier goes down while zebra
is installing nexthops into the kernel. Running zebra
with no zebra nexthop kernel enable
seems to be a workaround. The problem doesn't reproduce then.
I have the similar issue, when interface flaps sometimes I get such errors and no routes installed. any plans to solve it? I can give it a try if smb will tell me the direction.
@zstas Have you seen this PR: https://github.com/FRRouting/frr/pull/15082?
well, this is good news. I hope it will be merged soon, so I can backport it to my branch. thank you!
Hi, I've possibly ran in the same condition, interface was considered as invalid, YET the BGP peerings are up over it, routes imported but marked as rejected. As this happens in a production env, it's hard to reproduce. I've added some logging and alerting to catch more precise logs the next time it happens.
I can reliably reproduce the situation where zebra won't automatically recover after a failed
RTM_NEWNEXTHOP
due to aCarrier for nexthop device is down
error. I'm using a virtual device here, but we've seen the same problem on a production system after a link flap on a physical NIC.Here is the latest FRR configuration sufficient to reproduce the behaviour:
Here is the script to reproduce the bug:
After running the script I don't see the 192.168.55.55/32 route in the kernel table.
Log from zebra with debugging enabled: https://gist.github.com/sysoleg/c171e06b7cde67c2c4d06810fcae300e
The reason why
RTM_NEWNEXTHOP
failed is becausenetif_carrier_ok(dev)
is false (seertm_to_nh_config()
in the Linux kernel source). The NIC driver callsnetif_carrier_off(dev)
and this immediately sets the__LINK_STATE_NOCARRIER
device state flag. TheRTM_NEWLINK
message is not even sent at this stage, only possibly scheduled (see below). Since there is almost no delay between theecho 1
andecho 0
, the flag is set (again) before zebra starts to install nexthops into the kernel. At that moment, zebra thinks that it's OK to install the nexthop object, but it's not, and it doesn't know it.Now to the reason why automatic recovery fails.
Running the script without the last sleep (
sleep 1
) gives us threeRTM_NEWLINK
events (you can monitor these with theip monitor
coomand):So only three events (DOWN, UP, UP). If there were four (DOWN, UP, DOWN, UP) - automatic recovery would be possible after the last (fourth) UP. But in the three-event scenario zebra won't try to reinstall nexthops after the third UP because it thinks the interface is already UP (having already processed the second UP with problems installing objects into the kernel).
The reason why we only get three events is in the Linux kernel. Looking into
linkwatch_fire_event()
, which is called bynetif_carrier_off(dev)
ornetif_carrier_on(dev)
, we can see that the work, that leads to the operstate update and eventually theRTM_NETLINK
being sent, is only scheduled immediately if the event is "urgent" (link is up in our case) or if there's no linkwatch event currently pending and we're "wrapped around" (seelinkwatch_schedule_work()
). Otherwise, work is delayed by up to HZ jiffies (1 second) to (quote) "runaway driver does not cause a storm of messages on the netlink socket".If you uncomment the
sleep 1
line in the mentioned script, you'll get four events, because in this case you're delaying the last carrier "on", which allows delayed work that was created after the previous carrier "off" to update the device link status and eventually send an event while the carrier is still "off". No nexthop problems after the last UP in this case.So the kernel is working as it should: if not urgent - no more than one event per second, but as we have two consecutive UPs with no DOWN in between, the last UP is ignored by the FRR.
Any thoughts on this?