FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.36k stars 1.25k forks source link

bgp_always_compare_med failing 35% in micronet runs #13779

Closed eqvinox closed 11 months ago

eqvinox commented 1 year ago

about 35% of micronet runs (⇒ #13716) fail in bgp_always_compare_med:

AssertionError: Testcase test_verify_bgp_always_compare_med_functionality_by_restarting_daemons_clear_bgp_shut_neighbors_p1 : Failed 
   Error: [DUT: r1] VRF: default, BGP is not converged
assert '[DUT: r1] VRF: default, BGP is not converged' is True

test was added a few days ago in #13622

example run: https://ci1.netdef.org/browse/TESTING-MICRONET-651

@kuldeepkash

donaldsharp commented 1 year ago

If I had to guess this crash: BGP: Received signal 11 at 1686141212 (si_addr 0x0, PC 0x7f1a879abdec); aborting... BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6d) [0x7f1a87b3aa2d] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf3) [0x7f1a87b3ac33] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xf0091) [0x7f1a87b6e091] BGP: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f1a877db730] BGP: /lib/x86_64-linux-gnu/libyang.so.2(+0x1cdec) [0x7f1a879abdec] BGP: /lib/x86_64-linux-gnu/libyang.so.2(+0x13ee2) [0x7f1a879a2ee2] BGP: /lib/x86_64-linux-gnu/libyang.so.2(+0x168ff) [0x7f1a879a58ff] BGP: /lib/x86_64-linux-gnu/libyang.so.2(+0x1687f) [0x7f1a879a587f] BGP: /lib/x86_64-linux-gnu/libyang.so.2(+0x1687f) [0x7f1a879a587f] BGP: /lib/x86_64-linux-gnu/libyang.so.2(+0x1687f) [0x7f1a879a587f] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(nb_config_diff+0x47) [0x7f1a87b4dc27] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(nb_candidate_commit_prepare+0x7c) [0x7f1a87b4fb5c] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(nb_candidate_commit+0x4b) [0x7f1a87b4fefb] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xd245f) [0x7f1a87b5045f] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xd4fae) [0x7f1a87b52fae] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(nb_cli_apply_changes_clear_pending+0x180) [0x7f1a87b53520] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xab031) [0x7f1a87b29031] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x8ec10) [0x7f1a87b0cc10] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(cmd_execute_command+0xda) [0x7f1a87b0cd7a] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(cmd_execute+0xd0) [0x7f1a87b0cf50] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x10750f) [0x7f1a87b8550f] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x107730) [0x7f1a87b85730] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x10ae4b) [0x7f1a87b88e4b] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(event_call+0x7d) [0x7f1a87b8015d] BGP: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xc0) [0x7f1a87b32830] BGP: /usr/lib/frr/bgpd(main+0x3e1) [0x559f7184b581] BGP: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7f1a8762d09b] BGP: /usr/lib/frr/bgpd(_start+0x2a) [0x559f7184d2da] BGP: in thread vtysh_read scheduled from ../lib/vty.c:2953 vty_event() 2023/06/07 12:37:57 BGP: [T83RR-8SM5G] bgpd 9.1-dev starting: vty@2605, bgp@:179

From this log file:

https://ci1.netdef.org/artifact/TESTING-MICRONET/TOPOD10AMD64/build-651/TestExecutionLogs/bgp_always_compare_med.test_bgp_always_compare_med_topo1/r3/bgpd.log

@mwinter-osr can we get a better decode or look at the dump file?

donaldsharp commented 1 year ago

r3 is applying this config:

no log commands
service integrated-vtysh-config
no route-map RMAP_MED_R3 permit 70
no route-map RMAP_MED_R3 permit 80
interface lo
no ipv6 address 2001:db8:f::3:17/128
router bgp 300
address-family ipv4 unicast
no neighbor 192.168.1.1 route-map RMAP_MED_R3 out
router bgp 300
address-family ipv6 unicast
no neighbor fd00:0:0:1::1 route-map RMAP_MED_R3 out
interface lo
ipv6 address 2001:DB8:F::3:17/128
no ip prefix-list pf_ls_r3_ipv4 seq 30 permit 192.168.20.1/32
no ipv6 prefix-list pf_ls_r3_ipv6 seq 30 permit 192:168:20::1/128

From the log commands output it really looks like one of the last 2 no commands caused the crash in bgp

donaldsharp commented 1 year ago

IMO this looks like the problem of needing to upgrade to a new libyang2 version.

eqvinox commented 1 year ago

consensus is to upgrade libyang

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

frrbot[bot] commented 11 months ago

This issue will be automatically closed in the specified period unless there is further activity.