SNAS / openbmp

OpenBMP Server Collector
www.openbmp.org
Eclipse Public License 1.0
230 stars 76 forks source link

Handling peer down/up events seems harder to parse in scaling tests bouncing bgp peers #74

Open 3fr61n opened 6 years ago

3fr61n commented 6 years ago

Hi @TimEvens

Before anything, excellent project! quite interesting.

I was doing some scaling tests, I noticed that for a router handling ~100 peers with ~3000 routes per peer, when I bounced all bgp sessions (or restart the bmp collector), it takes a lot of time (~40-50min) to the collector to dump all information on kafka. (bgp peer down/up event and bgp route updates)

Checking the logs on the openbmp collector it seems that bgp tear down/up events takes a lot longer to process than the bgp route updates. (is this expected?)

For instance, the following logs shows that it takes 10 seconds to process a peer down event...

2018-07-20T11:07:24.445246 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.5: BGP peer down notification with reason code: 1 2018-07-20T11:07:34.456889 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.7: BGP peer down notification with reason code: 1 2018-07-20T11:07:44.464163 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.37: BGP peer down notification with reason code: 1

Meanwhile route updates goes quite fast...

Checking the router side using logs and counters, the router dump all bmp events in just ~3-4 min, however until all peer down/up are being processes the collector does not begin with any route update processing. (this is also expected?)

Thanks in advance and Regard

TimEvens commented 5 years ago

The 10 second gap between peer down events must be on the router side. The collector does not cache or store (eg. maintain a rib). The collector is just a real time pass though of bmp/bgp messages, The delay that we would see is on the consumer side (eg. DB such as Postgres or MySQL). The openbmp log messages indicating a 10 second gap must be some router/sender causing that. Which router/version are you using? Can you send me a pcap trace at tim@openbmp.org?