Open crosser opened 4 days ago
I don't understand what does this failure mean
Jobs Failure: 1/49
TopoTests Ubuntu 22.04 amd64 Part 9 -> https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO9U2204AMD64-4041
🚫 ospf_metric_propagation.test_ospf_metric_propagation test_link_1_4_down
It is just on one architecture? Is it related to the change?...
I don't understand what does this failure mean
Jobs Failure: 1/49 TopoTests Ubuntu 22.04 amd64 Part 9 -> https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO9U2204AMD64-4041 🚫 ospf_metric_propagation.test_ospf_metric_propagation test_link_1_4_down
It is just on one architecture? Is it related to the change?...
Ignore it, I rerun failed tests only.
But still, could you fix frrbot check (styling)?
But still, could you fix frrbot check (styling)?
Done.
I did not do it intially because it asked to change some adjacent code that my change did not touch. Did now.
LGTM, but IMO this should also be covered by a topotest, what do you think?
I agree that it would be useful, but as I am not acquainted with the testing framework that this project is using, I am not sure that I will be able to do it in reasonable time...
Is this the same case encountered in #14808?
Is this the same case encountered in #14808?
Looks like, and there is a topotest already. Anyway, it would be great if @chiragshah6 could help here reviewing.
Is this the same case encountered in #14808?
Looks like so! And such a shame that I did not find it and had to redo all the work myself.
But even bigger shame that your PR was not merged back then. Would have saved us quite some trouble that we had in production...
edit: @leonshaw , do you think that my change looks a bit cleaner? But, I am no judge, you guys decide which to take! The main thing is to have the bug fixed.
edit: @leonshaw , do you think that my change looks a bit cleaner?
I think you are correct. It's cleaner to have both IPv4/IPv6 nexthops explicitly. BTW, you can cherry-pick the topotest part if you like.
In L3 BGP-EVPN, if there are both IPv4 and IPv6 routes in the VPN, zebra maintains two instances of
struct zebra_neigh
object: one with IPv4 address of the nexthop, and another with IPv6 address that is an IPv4 mapped to IPv6, but only one intance ofstruct zebra_mac
object, that contains a list of nexthop addresses that use this mac.The code in
zebra_vxlan
module uses the fact that the list is empty as the indication that thezebra_mac
object is unused, and needs to be dropped. However, preexisting code used nexthop address converted to IPv4 notation for the element of this list. As a result, when twozebra_neigh
objects, one IPv4 and one IPv6-mapped-IPv4 were linked to thezebra_mac
object, only one element was added to the list. Consequently, when one of the twozebra_neigh
objects was dropped, the only element in the list was removed, making it empty, andzebra_mac
object was dropped, and neigbrour cache elements uninstalled from the kernel.As a result, after the last route in one family was removed from a remote vtep, all remaining routes in the other family became unreachable, because RMAC of the vtep was removed.
This commit makes
zebra_mac
use uncoerced IP address of thezebra_neigh
object for the entries in thenh_list
. This way,zebra_mac
object no longer loses track ofzebra_neigh
objects that need it.Bug-ID: 16340