Interface events (e.g. when reconfigured) in the operating system in certain cases cause ospf6d to consume all available RAM and swap. Eventually the process is killed when it hits its limit or resources run out. Once ospf6d restarts, it returns to normal.
It doesn't seem to happen with bgpd or ospfd as far as I can tell. Thus far I've only been able to make it happen with ospf6d.
There doesn't appear to be a backtrace since ospf6d itself does not crash, the OS kills the process when it fails to allocate more memory.
[X] Did you check if this is a duplicate issue? -- Yes, I saw a couple other issues for memory leaks such as #1725 but none that looked this extreme or fit the other symptoms
[ ] Did you test it on the latest FRRouting/frr master branch? -- No, only 7.5.1
Logs from around one occurrence
May 20 09:14:02 hostname watchfrr[38574]: ospf6d state -> up : connect succeeded
May 20 09:14:03 hostname ospf6d[80507]: SPF: Scheduled in 0 msec
May 20 09:14:03 hostname ospf6d[80507]: SPF processing: # Areas: 1, SPF runtime: 0 sec 86 usec, Reason: R+, L+
May 20 09:14:07 hostname ospf6d[80507]: SPF: Scheduled in 0 msec
May 20 09:14:07 hostname ospf6d[80507]: SPF processing: # Areas: 1, SPF runtime: 0 sec 45 usec, Reason: R+, R-, L+
May 20 09:14:07 hostname ospf6d[80507]: SPF: Scheduled in 50 msec
May 20 09:14:07 hostname ospf6d[80507]: SPF processing: # Areas: 1, SPF runtime: 0 sec 71 usec, Reason: R+, R-
May 20 09:14:12 hostname ospf6d[80507]: SPF: Scheduled in 0 msec
May 20 09:14:12 hostname ospf6d[80507]: SPF processing: # Areas: 1, SPF runtime: 0 sec 121 usec, Reason: R+, R-
May 20 09:14:33 hostname ospf6d[80507]: SPF: Scheduled in 0 msec
May 20 09:14:33 hostname ospf6d[80507]: SPF processing: # Areas: 1, SPF runtime: 0 sec 125 usec, Reason: R+, R-
May 20 09:14:33 hostname ospf6d[80507]: Scheduling ipsec2000 for sso
May 20 09:14:33 hostname ospf6d[80507]: SPF: Scheduled in 43 msec
May 20 09:14:36 hostname ospf6d[80507]: SPF processing: # Areas: 1, SPF runtime: 0 sec 47 usec, Reason: L+, L-
May 20 09:15:02 hostname ospf6d[80507]: [EC 100663313] SLOW THREAD: task ospf6_receive (23c6c0) ran for 25260ms (cpu time 3779ms)
May 20 09:15:07 hostname watchfrr[38574]: [EC 268435457] ospf6d state -> unresponsive : no response yet to ping sent 30 seconds ago
May 20 09:15:36 hostname watchfrr[38574]: [EC 268435457] ospf6d state -> down : read returned EOF
May 20 09:15:40 hostname ospf6d[69265]: area_range: for prefix 2001:db8:1:ee30::/64, flag = 0
May 20 09:15:40 hostname ospf6d[69265]: area_range: for prefix 2001:db8:f15:110::/64, flag = 0
May 20 09:15:40 hostname ospf6d[69265]: Scheduling lagg0.4091 for sso
May 20 09:15:40 hostname ospf6d[69265]: Scheduling ipsec2000 for sso
May 20 09:15:40 hostname watchfrr[38574]: ospf6d state -> up : connect succeeded
May 20 09:15:41 hostname ospf6d[69265]: SPF: Scheduled in 0 msec
May 20 09:15:41 hostname ospf6d[69265]: SPF processing: # Areas: 1, SPF runtime: 0 sec 86 usec, Reason: R+, L+
May 20 09:15:47 hostname ospf6d[69265]: SPF: Scheduled in 0 msec
May 20 09:15:47 hostname ospf6d[69265]: SPF processing: # Areas: 1, SPF runtime: 0 sec 44 usec, Reason: R+, R-, L+
May 20 09:15:47 hostname ospf6d[69265]: SPF: Scheduled in 50 msec
May 20 09:15:47 hostname ospf6d[69265]: SPF processing: # Areas: 1, SPF runtime: 0 sec 54 usec, Reason: R+, R-
May 20 09:15:52 hostname ospf6d[69265]: SPF: Scheduled in 0 msec
May 20 09:15:52 hostname ospf6d[69265]: SPF processing: # Areas: 1, SPF runtime: 0 sec 80 usec, Reason: R+, R-
FRR configuration
frr defaults traditional
hostname xxxxxxxxx
password yyyyyyyyy
log syslog
service integrated-vtysh-config
!
ip router-id 10.15.0.1
!
interface lagg0.4091
ip ospf area 0.0.0.15
ipv6 ospf6 passive
interface ipsec2000
ip ospf mtu-ignore
ip ospf area 0.0.0.0
ipv6 ospf6 cost 10
ipv6 ospf6 mtu-ignore
!
router ospf
ospf router-id 10.15.0.1
log-adjacency-changes detail
passive-interface lagg0.4091
area 0.0.0.0 shortcut default
area 0.0.0.15 stub
area 0.0.0.15 shortcut default
area 0.0.0.15 range 10.15.0.0/16
!
router ospf6
ospf6 router-id 10.15.0.1
area 0.0.0.0 range 2001:db8:1:ee30::/64 cost 1
area 0.0.0.0 range 2001:db8:f15:110::/64 cost 10
interface lagg0.4091 area 0.0.0.0
interface ipsec2000 area 0.0.0.0
!
line vty
!
end
In this instance it happens when ipsec2000 has an event (e.g. the IPsec daemon is stopped and started), but the same condition occurs without IPsec when using ospf6d on regular interfaces.
To Reproduce
Configure OSPF6 on an interface, then reconfigure that interface in the operating system in various ways. It doesn't seem to happen for everything, e.g. a simple down/up doesn't trigger it, but restarting IPsec (if bound to an IPsec VTI interface) or reconfiguring the interface addresses seems to do it.
Expected behavior
Expected behavior is for it to recover from the condition without consuming all available memory, as ospfd and bgpd do.
Versions
OS Version:
FreeBSD 12.2-STABLE. Specifically current pfSense development snapshots of either CE 2.6.0 or Plus 21.05. It is not specific to those, but that's the easiest way I've found to reproduce it.
During the event, watching top -a -o res shows ospf6d consuming all the memory and swap on the system, even on a system with 8G RAM and 2G swap. At the start of the event it briefly uses 100% of a CPU core but then the CPU usage settles down and the RAM usage increases. This was copied from the console just before it ran out of memory:
last pid: 47737; load averages: 0.61, 0.59, 0.47 up 0+00:26:48 09:15:22
72 processes: 2 running, 70 sleeping
CPU: 0.3% user, 0.0% nice, 8.8% system, 1.7% interrupt, 89.3% idle
Mem: 5551M Active, 644K Inact, 1214M Laundry, 878M Wired, 348K Buf, 216M Free
ARC: 545M Total, 182M MFU, 349M MRU, 118K Anon, 2732K Header, 12M Other
299M Compressed, 635M Uncompressed, 2.13:1 Ratio
Swap: 2048M Total, 2024M Used, 24M Free, 98% Inuse, 52M In, 61M Out
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
80507 root 1 23 0 8222M 5227M CPU1 1 0:08 9.60% /usr/local/sbin/ospf6d -d
On systems with less RAM, the process dies much sooner since it doesn't take as long for it to consume all of the available resources.
So far I've been able to reproduce the issue on several different systems of varying size, on amd64 and ARM, hardware of various vintages, as well as VMs (ESX and KVM). It doesn't seem specific to anything in the OS or platform that I can tell.
Interface events (e.g. when reconfigured) in the operating system in certain cases cause
ospf6d
to consume all available RAM and swap. Eventually the process is killed when it hits its limit or resources run out. Onceospf6d
restarts, it returns to normal.It doesn't seem to happen with
bgpd
orospfd
as far as I can tell. Thus far I've only been able to make it happen withospf6d
.There doesn't appear to be a backtrace since
ospf6d
itself does not crash, the OS kills the process when it fails to allocate more memory.[X] Did you check if this is a duplicate issue? -- Yes, I saw a couple other issues for memory leaks such as #1725 but none that looked this extreme or fit the other symptoms [ ] Did you test it on the latest FRRouting/frr master branch? -- No, only 7.5.1
Logs from around one occurrence
FRR configuration
In this instance it happens when
ipsec2000
has an event (e.g. the IPsec daemon is stopped and started), but the same condition occurs without IPsec when usingospf6d
on regular interfaces.To Reproduce
Configure OSPF6 on an interface, then reconfigure that interface in the operating system in various ways. It doesn't seem to happen for everything, e.g. a simple down/up doesn't trigger it, but restarting IPsec (if bound to an IPsec VTI interface) or reconfiguring the interface addresses seems to do it.
Expected behavior
Expected behavior is for it to recover from the condition without consuming all available memory, as
ospfd
andbgpd
do.Versions
FreeBSD 12.2-STABLE. Specifically current pfSense development snapshots of either CE 2.6.0 or Plus 21.05. It is not specific to those, but that's the easiest way I've found to reproduce it.
FreeBSD xxxxxxx 12.2-STABLE FreeBSD 12.2-STABLE plus-RELENG_21_05-n202546-902df674b682 pfSense amd64
See above, FreeBSD 12.2-STABLE
FRR version 7.5.1
Additional context
During the event, watching
top -a -o res
showsospf6d
consuming all the memory and swap on the system, even on a system with 8G RAM and 2G swap. At the start of the event it briefly uses 100% of a CPU core but then the CPU usage settles down and the RAM usage increases. This was copied from the console just before it ran out of memory:On systems with less RAM, the process dies much sooner since it doesn't take as long for it to consume all of the available resources.
So far I've been able to reproduce the issue on several different systems of varying size, on amd64 and ARM, hardware of various vintages, as well as VMs (ESX and KVM). It doesn't seem specific to anything in the OS or platform that I can tell.