Closed ghost closed 2 years ago
BGP's usage of streams is very large:
Stream : 1295083 (variably sized) 264771096
And I see a bunch of data associated with a clear * command. What are you doing on this machine? Maybe we have a bug somewhere in this area.
@donaldsharp I already found that very weird too. I'm pretty much getting 3 full tables of 3 different routers, then I'm sending back the routes again (so pretty much, I have this machine as route reflector in Frankfurt, and my backbone routers based on Juniper are meshed to that machine).
The data associated with the clear command could be because the machine is sitting at this for a while:
10.42.42.16 4 49004 943243 958567 0 0 0 00:16:07 Clearing
there's genuinely no reason why that session would flap at all, though, I do believe that it terminated the session because the entire thing is close to running out of memory.
If we were running out of memory bgp would just be killed and we would restart. There is nothing more complicated here.
Let's get some data gathered from perf about bgp:
https://github.com/FRRouting/frr/wiki/Perf-Recording
Can we gather a flame graph for bgp? I am not sure what is going on here and would like to see some data gathered. Can we get some idea of how you are configuring your peers?
I'll have a look into PERF recording in a second. The peers are set up completely trivial:
!
router bgp 49004
coalesce-time 20000
neighbor igp peer-group
neighbor igp remote-as 49004
neighbor igp update-source 10.42.42.40
neighbor 10.42.42.9 peer-group igp
neighbor 10.42.42.16 peer-group igp
neighbor 10.42.42.34 peer-group igp
!
address-family ipv4 unicast
neighbor igp route-reflector-client
neighbor igp route-map igp-block out
neighbor 10.42.42.34 addpath-tx-all-paths
exit-address-family
!
address-family ipv6 unicast
neighbor igp route-reflector-client
neighbor igp route-map igp-block out
exit-address-family
!
the addpath-tx-all-paths
is in there for "weird" internal reasons, that being pretty much that this peer needs to receive all available routes, not just preferred, due to special network preferences happening on that router. Shouldn't be of relevance though I do believe.
[Edit]:
the igp-block
route-map literally does nothing at the moment:
route-map igp-block permit 1
!
I've tried taking the flame graph, I hope I didn't mess it up, never used before. Please do let me know if it's useless. https://ufile.io/9bywe
Here's the same taken from the zebra pid, not sure though if it helps as it literally shows bgpd
and almost the same as the other flame graph. https://ufile.io/7cb6d
let's just attach the flamegraphs to this issue? We'll loose them otherwise right?
BGP is taking so long because it is swapping heavily.
@donaldsharp It won't let me attach an .svg
right here where I reply. Do you see any alternative way?
Either way, I entirely understand it's taking ages for sessions to clear and converge, but my point merely is that the box has 2GB of RAM, of which 1.9GB are used by FRR just to handle 3 BGP full feeds, which seems to be a bit too much, making me suspect a memory leak.
@donaldsharp I've tried to enable bgpd debugging, perhaps there's anything obvious that would explode the memory. I've seen large amounts of that:
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.122.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.120.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.116.0/22 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.115.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.114.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.112.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.108.0/22 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.106.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.100.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.96.0/22 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.94.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.93.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.90.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.88.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.84.0/22 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.82.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.81.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.80.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.76.0/22 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.74.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.68.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.66.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.65.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.64.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.62.0/23 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.61.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
Nov 4 16:43:19 bb1 bgpd[2221]: 10.42.42.9 rcvd UPDATE about 147.194.60.0/24 IPv4 unicast -- DENIED due to: reflected from the same cluster;
and that:
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 206.201.74.0/24 with addpath ID 10966111 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 206.201.77.0/24 with addpath ID 10966110 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 206.201.79.0/24 with addpath ID 10966109 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 206.201.80.0/21 with addpath ID 11263788 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 206.201.88.0/24 with addpath ID 11347202 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 206.201.89.0/24 with addpath ID 11347203 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 206.201.92.0/24 with addpath ID 11347204 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 206.201.93.0/24 with addpath ID 11347205 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.153.189.0/24 with addpath ID 12812985 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.193.184.0/24 with addpath ID 12812814 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.193.187.0/24 with addpath ID 12812812 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 191.96.36.0/24 with addpath ID 12812808 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 191.96.16.0/24 with addpath ID 12812809 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 179.61.219.0/24 with addpath ID 12812810 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 173.239.198.0/24 with addpath ID 12812811 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.193.188.0/24 with addpath ID 12812807 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.193.189.0/24 with addpath ID 12812806 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.193.201.0/24 with addpath ID 12812805 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.193.216.0/21 with addpath ID 12812804 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.193.222.0/24 with addpath ID 12812803 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.194.4.0/24 with addpath ID 12812802 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 116.93.128.0/24 with addpath ID 12812801 IPv4 unicast -- unreachable
Nov 4 16:43:19 bb1 bgpd[2221]: u12:s15 send UPDATE 104.194.16.0/22 with addpath ID 12812800 IPv4 unicast -- unreachable
and
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=199.202.145.0/24, selected=0x55db48a4fb90
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.31.0/24
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.31.0/24, selected=(nil)
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.31.0/24
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.31.0/24, selected=0x55db145ed900
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.30.0/24
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.30.0/24, selected=(nil)
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.30.0/24
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.30.0/24, selected=0x55db145e7870
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.29.0/24
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.29.0/24, selected=(nil)
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.29.0/24
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.29.0/24, selected=0x55db145e15b0
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.28.0/24
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.28.0/24, selected=(nil)
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.28.0/24
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.28.0/24, selected=0x55db145dfde0
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.28.0/22
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.28.0/22, selected=(nil)
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.28.0/22
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.28.0/22, selected=0x55db5d02c630
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.19.0/24
Nov 4 16:43:18 bb1 bgpd[2221]: subgroup_process_announce_selected: p=104.194.19.0/24, selected=(nil)
Nov 4 16:43:18 bb1 bgpd[2221]: group_announce_route_walkcb: afi=IPv4, safi=unicast, p=104.194.19.0/24
the latter is pretty much creating hundreds of lines per second. If that's weird, what would cause it?
I also encountered a very large memory footprint, when i get an v4 full route from upstream, then frr Consumes 1.5G of memory. Really need to solve the problem of large memory consumption
i have config is bgp bestpath as-path confed bgp bestpath med confed
neighbor 10.88.45.149 remote-as 108 neighbor 10.88.45.149 enforce-first-as neighbor 10.88.45.149 update-source 10.88.45.150 neighbor 10.88.45.149 activate neighbor 10.88.45.149 next-hop-self neighbor 10.88.45.149 remove-private-AS all neighbor 10.88.45.149 prefix-list ipv4in in neighbor 10.88.45.149 prefix-list myout out
How about the latest (master) version?
Already had a similar issue a while ago: #2527
On a system working as route reflector which is part of an IS-IS area (the reflector is deployed to connect a bunch of Junipers), there is currently 3 BGP full feeds:
Memory usage for the RIB according to
show bgp summary
:now, taking a look at the systems memory:
this is a bit far off sanity. The same full tables barely take a GB of memory on juniper.
Now checking the
show memory
command:I don't have a great understanding of frr, as I never contributed as developer, so I'm kindly asking - does any of you have an inclination based on the above what could be memory leaking here?