If we delete kernel interface and create it and set its ip in a short time, in rare cases, interface ip will be lost in zebra which can be confirmed by vtysh show interface brirf command. This will lead to abnormal behavior of other protocol daemons, for example, bgpd does not announce the route corresponding to interface ip even it was specified by network command.
[x] Did you check if this is a duplicate issue?
[ ] Did you test it on the latest FRRouting/frr master branch?
Versions
OS Version: Debian 11
Kernel: Linux 5.10
FRR Version: 8.2
To Reproduce
Prepare script below for test
run_test_intf_ip.sh
!/bin/bash
The problem only happens in very few cases so we add number of interfaces to increase possibility of reproducing
num=180
for((i=1; i<=num; i++))
do
ip link del dev test$i
done
for((i=1; i<=num; i++))
do
The problem is observed on dummy interface. Haven't test on other types.
ip link add dev test$i type dummy && ip link set dev test$i up
ip addr add 133.0.$i.1/24 dev test$i
done
2. Open zebra kernel log by `debug zebra kernel` and `log stdout debugging`
3. Execute `sudo ./run_test_intf_ip.sh`
4. Watch the log and wait for zebra done. Then check if zebra lost ip of any test interface by `show interface brief`. If none, repeat step 3
** Analysis**
Here is a part of zebra log when the problem happened on interface Loopback0.
<details><summary>Click to see zebra log</summary>
<div><pre><code>
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWADDR(20), len=72, seq=0, pid=0
ZEBRA: [RGWF1-EHXT1] netlink_interface_addr_dplane: RTM_NEWADDR nsid 0 ifindex 256 flags 0x80:
ZEBRA: [ME3M2-X6YT9] IFA_ADDRESS fe80::bc89:d6ff:fe37:77af/64
ZEBRA: [P2VPT-508WP] IFA_CACHEINFO pref -1, valid -1
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWADDR(20), len=84, seq=1684932606, pid=34938553
ZEBRA: [RGWF1-EHXT1] netlink_interface_addr_dplane: RTM_NEWADDR nsid 0 ifindex 256 flags 0x80:
ZEBRA: [XMC8C-4ZFJ9] IFA_LOCAL 10.1.0.228/32
ZEBRA: [ME3M2-X6YT9] IFA_ADDRESS 10.1.0.228/32
ZEBRA: [Y9HR3-XD5TG] IFA_LABEL Loopback0
ZEBRA: [P2VPT-508WP] IFA_CACHEINFO pref -1, valid -1
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=60, seq=0, pid=0
ZEBRA: [Q9CEC-J9KWY] zebra_if_addr_update_ctx: can't find ifp at nsid 0 index 256
---
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [NAV05-EY6FH] RTM_NEWLINK ADD for Loopback0(256) vrf_id 0 type 0 sl_type 0 master 0 flags 0x82
ZEBRA: [ZAG0W-VSNSD] interface Loopback0 vrf default(0) index 256 becomes active.
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [W6BZR-YZPAB] RTM_NEWLINK update for Loopback0(256) sl_type 0 master 0 flags 0x100c3
ZEBRA: [N7FN2-J93A7] Intf Loopback0(256) has come UP
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 multicast proto kernel NS 0
ZEBRA: [Q3MY3-G3YNJ] MCAST VRF: default(0) RTM_NEWROUTE (0.0.0.0,255.0.0.0) IIF: Unknown(0) OIF: jiffies: 0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 unicast proto kernel NS 0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWADDR(20), len=72, seq=0, pid=0
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 local proto kernel NS 0
ZEBRA: [J3J81-V75NW] Route rtm_type: local(2) intentionally ignoring
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWROUTE(24), len=116, seq=0, pid=0
ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv6 anycast proto kernel NS 0
ZEBRA: [J3J81-V75NW] Route rtm_type: anycast(4) intentionally ignoring
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWLINK(16), len=1344, seq=0, pid=0
ZEBRA: [W6BZR-YZPAB] RTM_NEWLINK update for Loopback0(256) sl_type 0 master 0 flags 0x102c3
ZEBRA: [P48K1-574RY] Intf Loopback0(256) PTM up, notifying clients
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWADDR(20), len=84, seq=1684932606, pid=3493855318
</code></pre></div>
</details>
We can see that it is because zebra process RTM_NEWADDR from dp-netlink-in **earlier than** RTM_NEWLINK from netlink-listen.
But I don't know why would this happen sometimes and why it wouldn't happen normally. And the most important, is there any way to prevent this?
This problem comes from [PR#9052 ](https://github.com/FRRouting/frr/pull/9052) so may I ask for your help? @mjstapp
Describe the bug
If we delete kernel interface and create it and set its ip in a short time, in rare cases, interface ip will be lost in zebra which can be confirmed by vtysh
show interface brirf
command. This will lead to abnormal behavior of other protocol daemons, for example, bgpd does not announce the route corresponding to interface ip even it was specified bynetwork
command.Versions
To Reproduce
!/bin/bash
The problem only happens in very few cases so we add number of interfaces to increase possibility of reproducing
num=180 for((i=1; i<=num; i++)) do ip link del dev test$i done
for((i=1; i<=num; i++)) do
The problem is observed on dummy interface. Haven't test on other types.
done