FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.38k stars 1.26k forks source link

BMPServer in topotests should not `pkill -f bmpserver` #17465

Open donaldsharp opened 1 week ago

donaldsharp commented 1 week ago

Description

The bmpServer shutdown does a pkill -f bmpserver which says kill all processes with this name. FRR topotests has multiple topotests that run at the same time, the topotests also currently has 2 tests which use the bmpserver. If they happen to be running at the same time and one test finishes before the other, the first test will kill the second tests bmpserver, thus causing it to not properly finish running.

lib/topogen.py:        self.run("pkill -f bmpserver")
sharpd@eva ~/f/t/topotests (more_found_connection_conversion_issues)> git grep add_bmp_server
bgp_bmp/test_bgp_bmp.py:    tgen.add_bmp_server("bmp1", ip="192.0.2.10", defaultRoute="via 192.0.2.1")
bgp_bmp_vrf/test_bgp_bmp_vrf.py:    tgen.add_bmp_server("bmp1", ip="192.0.2.10", defaultRoute="via 192.0.2.1")

I repeatedly see bgp_bmp failing to run properly locally.

the failing test has this log bm1/bmpserver.log:

[2024-11-19 14:43:59] Got message type: <class 'bmp.BMPRouteMonitoring'> 84
[2024-11-19 14:43:59] Got message type: <class 'bmp.BMPRouteMonitoring'> 85
[2024-11-19 14:43:59] Finished dissecting data from ('192.0.2.1', 51660)
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.
[2024-11-19 14:43:59] Received signal 15, shutting down.

The exec.log has this at that time:

2024-11-19 14:43:59,059 DEBUG: r1: vtysh result:
        {
         "vrfId": 0,
         "vrfName": "default",
         "tableVersion": 5,
         "routerId": "192.168.0.1",
         "defaultLocPrf": 100,
         "localAS": 65501,
         "routes": {  "routeDistinguishers" : { "444:2" : { "172.31.0.15/32": [{"valid":true,"bestpath":true,"selectionReason":"First path received","pathFrom":"external","prefix":"172.31.0.15","prefixLen":32,"network":"172.31.0.15/32","version":5,"metric":0,"weight":0,"peerId":"192.168.0.2","path":"65502","origin":"IGP","nexthops":[{"ip":"192.168.0.2","hostname":"r2","afi":"ipv4","used":true}]}]
         }  }  }  }
2024-11-19 14:43:59,059 DEBUG: topo: 'router_json_cmp' succeeded after 0.01 seconds
2024-11-19 14:43:59,059 DEBUG: topo: 'router_json_cmp' polling started (interval 1 secs, maximum 30 tries)
2024-11-19 14:43:59,059 DEBUG: r1: vtysh command => 'show bgp ipv6 vpn json'
2024-11-19 14:43:59,059 DEBUG: r1: cmd_status("/bin/bash -c 'vtysh  -c '"'"'show bgp ipv6 vpn json'"'"' 2>/dev/null'")
2024-11-19 14:43:59,073 DEBUG: r1:
        stdout: ...

pkill is run at this time:

2024-11-19 14:44:31,563 DEBUG: bmp1: cmd_status("/bin/bash -c 'pkill -f bmpserver'")

Version

latest master

How to reproduce

Run multiple bmp tests at the same time.

Expected behavior

one bmp test not to kill another bmp tests mojo

Actual behavior

mojo killed

Additional context

No response

Checklist