FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.28k stars 1.24k forks source link

zebra may lost interface ip when delete kernel interfaces and recreate them and set their ip in a short time #13630

Closed ExplorerNo9 closed 1 year ago

ExplorerNo9 commented 1 year ago

Describe the bug

If we delete kernel interface and create it and set its ip in a short time, in rare cases, interface ip will be lost in zebra which can be confirmed by vtysh show interface brirf command. This will lead to abnormal behavior of other protocol daemons, for example, bgpd does not announce the route corresponding to interface ip even it was specified by network command.

Versions

To Reproduce

  1. Prepare script below for test
    
    run_test_intf_ip.sh  

!/bin/bash

The problem only happens in very few cases so we add number of interfaces to increase possibility of reproducing

num=180 for((i=1; i<=num; i++)) do ip link del dev test$i done

for((i=1; i<=num; i++)) do

The problem is observed on dummy interface. Haven't test on other types.

ip link add dev test$i type dummy && ip link set dev test$i up
ip addr add 133.0.$i.1/24 dev test$i

done


2. Open zebra kernel log by `debug zebra kernel` and `log stdout debugging`
3. Execute `sudo ./run_test_intf_ip.sh`
4. Watch the log and wait for zebra done. Then check if zebra lost ip of any test interface by `show interface brief`. If none, repeat step 3

** Analysis**
Here is a part of zebra log when the problem happened on interface Loopback0.
<details><summary>Click to see zebra log</summary>
<div><pre><code>
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWADDR(20), len=72, seq=0, pid=0
ZEBRA: [RGWF1-EHXT1] netlink_interface_addr_dplane: RTM_NEWADDR nsid 0 ifindex 256 flags 0x80:
ZEBRA: [ME3M2-X6YT9]   IFA_ADDRESS   fe80::bc89:d6ff:fe37:77af/64
ZEBRA: [P2VPT-508WP]   IFA_CACHEINFO pref -1, valid -1
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWADDR(20), len=84, seq=1684932606, pid=34938553
ZEBRA: [RGWF1-EHXT1] netlink_interface_addr_dplane: RTM_NEWADDR nsid 0 ifindex 256 flags 0x80:
ZEBRA: [XMC8C-4ZFJ9]   IFA_LOCAL     10.1.0.228/32
ZEBRA: [ME3M2-X6YT9]   IFA_ADDRESS   10.1.0.228/32
ZEBRA: [Y9HR3-XD5TG]   IFA_LABEL     Loopback0
ZEBRA: [P2VPT-508WP]   IFA_CACHEINFO pref -1, valid -1
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=60, seq=0, pid=0
ZEBRA: [Q9CEC-J9KWY] zebra_if_addr_update_ctx: can't find ifp at nsid 0 index 256
---
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [NAV05-EY6FH] RTM_NEWLINK ADD for Loopback0(256) vrf_id 0 type 0 sl_type 0 master 0 flags 0x82
ZEBRA: [ZAG0W-VSNSD] interface Loopback0 vrf default(0) index 256 becomes active.
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [W6BZR-YZPAB] RTM_NEWLINK update for Loopback0(256) sl_type 0 master 0 flags 0x100c3
ZEBRA: [N7FN2-J93A7] Intf Loopback0(256) has come UP
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 multicast proto kernel NS 0
ZEBRA: [Q3MY3-G3YNJ] MCAST VRF: default(0) RTM_NEWROUTE (0.0.0.0,255.0.0.0) IIF: Unknown(0) OIF:  jiffies: 0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 unicast proto kernel NS 0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWADDR(20), len=72, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 local proto kernel NS 0
ZEBRA: [J3J81-V75NW] Route rtm_type: local(2) intentionally ignoring
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 anycast proto kernel NS 0
ZEBRA: [J3J81-V75NW] Route rtm_type: anycast(4) intentionally ignoring
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [W6BZR-YZPAB] RTM_NEWLINK update for Loopback0(256) sl_type 0 master 0 flags 0x102c3
ZEBRA: [P48K1-574RY] Intf Loopback0(256) PTM up, notifying clients
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWADDR(20), len=84, seq=1684932606, pid=3493855318
</code></pre></div>
</details>
We can see that it is because zebra process RTM_NEWADDR from dp-netlink-in **earlier than** RTM_NEWLINK from netlink-listen.

But I don't know why would this happen sometimes and why it wouldn't happen normally. And the most important, is there any way to prevent this? 
This problem comes from [PR#9052 ](https://github.com/FRRouting/frr/pull/9052) so may I ask for your help? @mjstapp 
donaldsharp commented 1 year ago

See #13396

ExplorerNo9 commented 1 year ago

OK, thank you donald!