Closed thisabstractmind closed 2 months ago
I have the same problem. When a rtsp camera is not reachable zmc consumes all the memory on the server. This seems to happen during the restart process of zmc. This has been reported many times in the forum. Setting max buffer does nothing because the stream is never started.
https://forums.zoneminder.com/viewtopic.php?t=32739 https://forums.zoneminder.com/viewtopic.php?t=32756 https://forums.zoneminder.com/viewtopic.php?t=32633
This is also reproducible in 1.37.x
I am running into the same issue, I believe. After the server went down a few times due to OOM, I set up cgroups to limit memory consumption.
www-data 483061 0.0 0.0 40064 2816 ? S Sep26 0:16 /usr/bin/perl -wT /usr/bin/zmdc.pl startup
www-data 483093 37.3 36.0 17501112 2934360 ? SLl Sep26 1541:12 /usr/bin/zmc -m 7
www-data 483100 0.0 0.1 52400 10052 ? S Sep26 0:00 /usr/bin/zmcontrol.pl --id 7
It hit the mem caps, however it was still running.
Here's what it looks like normally:
www-data 705258 0.0 0.2 40064 18400 ? S 12:01 0:00 /usr/bin/perl -wT /usr/bin/zmdc.pl startup
www-data 705288 33.7 7.2 923832 593144 ? SLl 12:01 5:57 /usr/bin/zmc -m 7
www-data 705293 0.0 0.3 46836 31684 ? S 12:01 0:00 /usr/bin/zmcontrol.pl --id 7
This was on 1.37.45~20230926121048-focal from the ppa.
Experiencing the same thing - currently running v1.37.43 Swap fills after a few days, mostly related to zmc processes.
@jrtaylor71 is correct. There is a direct correlation to the camera not being able to be accessed and the memory consumption.
I have not been able to recreate this here. Are the FFmpeg type or Remote type or something else? I need a debug level 3 log from one of the zmc processes.
@connortechnology ffmpeg in my case. I agree with @thisabstractmind that this might be related to camera not being accessible - i have 5 monitors accessed through vpn tunnel over internet, so their streams aren't as stable as local network and memory seems to climb after short network outages. I'm away atm, so can't get you the debug logs.
The same issue. I'm using 4 cheap WiFi Yi camera with YiHack to add RTSP server. Cameras are cheap and with constrained resources so RTSP stream is not stable (parallel background stream is still running to manufacturer server). All cameras are added as Ffmpeg
source and method is TCP
due to some random artefacts when UDP was used.
I'm using unprivileged LXC container in Proxmox. System is Debian. Zone Minder was working correctly for months (with high CPU usage due to video analysis but stability was OK).
I think problem begins after one of system update.
Now I'm using latest Debian Bookworm but today I've added deb multimedia repo, so ffmpeg + libraries were updated. Lets see what happens.
Debug from 3 cameras. Camera 5 is dead right now, camera's 7 and 12 I simulated a switch failure by rebooting it. All 3 camera's are are powered from an Avaya ERS5520 and run on vlan 555.
Before test
root@nvr:~# free -h total used free shared buff/cache available Mem: 31Gi 9.8Gi 19Gi 309Mi 1.7Gi 20Gi Swap: 8.0Gi 0B 8.0Gi
After test
root@nvr:~# free -h total used free shared buff/cache available Mem: 31Gi 10Gi 1.4Gi 310Mi 19Gi 19Gi Swap: 8.0Gi 2.0Mi 8.0Gi
After a larger failure like this I have had to reboot the vm to release the memory. Restarting the zoneminder service does not help.
zoneminder 1.36.33-jammy1 HPE DL360p G8 running vmware 6.5 ubuntu 22 vm has 16 cpu, 32gb ram, 100gb os, 10tb video, 15 camera's total
If restarting ZM doesn't free up the memory... then it's not ZM that is consuming it.
ANyways, thos debug logs show zm connecting and disconnection to the offline camera.. unfortunately no information as to why memory would be used.
In my own testing, I setup top, hit M to sort by memory use... and then power cycled a camera and watched what happened. What happened is that ZM had a read failure, shut down everything, freed up all ram, and then retried the connection as it should WHen the connection came back it resumed and didn't use any excess ram.
What you logs do NOT show, that I expected to see, was a resource shortage. THey seem to be happy.
So I'm stumped.
I have always tested with more than one camera failure and when it has happened it has always been more than one. In fact the more cameras that go offline the faster the memory gets used up.
As for freeing the memory I should have pointed out that cache and swap are never free'd but I have seen zm crash like the above output without a reboot. Just a service restart free's the acvtive memory.
I have included some screen shots from munin that I have had running to track usage.
Latest build has a fix for a small memleak when trying to reconnect to a dead camera. So might fix this. Shouldn't leak too quickly though, so maybe not.
On what version? I think most of us that reported this are still on 1.36.
I updated last week and tried it again, but I still seem to be having an issue with the memory use growing significantly. I let it go for a while and generally it stays pretty consistent in memory usage, but sometimes it starts consuming more than the normal amount of memory. It gets oom-killed by the cgroup limit before it causes problems for the rest of the machine.
I do have multiple cameras in use, and the memory in one may increase significantly while the others are still fine. Due to backups, sometimes the CPU load on the machine is high, and other times it's nearly Zoneminder alone using the cpu cycles for mocap. They are all ffmpeg streams, and I have tried both tcp and udp rtsp streams with no significant difference.
I set up a script (taking 1 sample per min) to record the memory usage over time, along with the absolute time (to use in correlation with the logs) and the process time (in seconds) and the rss/vsz to show when it starts going up. Here's an excerpt of the memory usage log where it nears the end, right before it gets killed:
pid,systime,process_time,vsz,rss,camera
2355727, 1707058441, 0, 254232, 38952, 9
2355727, 1707058501, 20, 1036344, 637328, 9
2355727, 1707058561, 48, 1110076, 612356, 9
. . .
2355727, 1707095161, 13482, 1522260, 978784, 9
2355727, 1707095221, 13513, 1587796, 1052752, 9
2355727, 1707095281, 13556, 2177620, 1671004, 9
2355727, 1707095341, 13582, 3554388, 2681368, 9
2355727, 1707095401, 13601, 4799572, 2949904, 9
2355727, 1707095461, 13616, 6507888, 3383784, 9
The memory log and debug log are attached, along with the crash data. I trimmed the debug log to within the last ~11.5 minutes so it wouldn't be too large. I do have several full logs along with corresponding memory usage logs if I should send anything else. zmc_memlog_2355727.txt zm_debug.log.2355727.end.txt.gz zmc_crash_2355727.txt
This is on 1.37.50~20240202093203-focal - I updated to current shortly before starting tests. I am not sure, but it's definitely possible there may be more than one issue at play as I haven't noticed significant increases in memory usage when I do try disconnecting/rebooting the cameras, and I am not on 1.36 like some of the others.
@Middim Thanks for the quality info. So at first glance, I see everything falling behind around 5:11. I see a ton of warnings from ffmpeg which could be bad network, but tend to more often be that your cpu is not keeping up. I also see that you are saving jpegs. Encoding jpegs is surprisingly cpu intensive. If you want the analysis images, then only save those. Saving jpegs is not feasible past 720p. Certainly not at 20fps.
@connortechnology I am the OP. I have not had an issue since changing video writer to pass through and disabling "save JPEGs". I think it was a bottleneck problem that led to delay and extra memory consumption. This was about 3 months ago.
Yup, that's what I think this issue is. Saving jpegs is just not feasible at today's resolutions. We could use hwaccel for it, but why bother when we could encode an mp4.
Long will likely get rid of the analysis jpegs as well in favour of storing an svg of the info and just overlaying that.
I'm not saving jpegs, but i am saving audio and decoding keyframes+ondemand. I upgraded to 1.37.50 a few days ago and swap isn't growing a fast, but still seems to be growing. Need to give it a few more days though.
Thanks for the info!
I changed it so it only saves analysis image - and an SVG option for the analysis info sounds great too!
It is likely that there would be cpu slowdown around that time as I have some backup processes that run through the night. In the past I have had the problem occur at random times. It would even occasionally retain large amounts of memory all day - but the recent changes that were made may already have resolved that as I didn't see that part occur again in the recent testing.
I am retesting and will let you know what the results show after I collect some data in the new configuration. I hadn't considered the jpegs would cause a problem, but that would make an easy solution, and I am not attached to having them.
One note though is that I have been saving things in this manner for a few years, but only had the problem since I updated from the older version of Zoneminder I was using last year, so I had attributed a change in process to something related to that. After that I also started using passthrough with the hopes that it would resolve some of the issue. While it didn't resolve the issue, it did significantly reduce the cpu load on the system.
I'll post back if this configuration change fixes it, but it may take a few days if things are successful, due to the random nature of it occurring.
Thanks again!
@connortechnology thanks heaps for the fix! 1.37.50 seems to have resolved the memory leak i was experiencing. Swap is no longer filling up :pray:
@connortechnology Thanks for your help and for everything you do with this! My system is in a running state now.
Just updating here as well, after running several days without hitting the memory limit.
After setting analysis only jpegs, it has yet to crash, and since the patch it doesn't appear that the memory slowly grows in one direction. The runtime is now greater than my post up above and no crashes and it seems to be managing memory, although it has been sometimes holding a decent amount of memory in use, it does seem it clears it out eventually, and does not exceed the cap I have set.
I have attached the memory logs of the two cameras I have running with data collection. zmc_memlog_2794131.txt zmc_memlog_2794089.txt
Thanks!!
I upgraded my system to 1.37.51 (current master) and there is an improvement with this issue. If a camera goes offline or is offline at startup the system no longer consumes all memory. The one thing I did find is that if you save a camera multiple times that is having a problem connect to an rtsp stream it runs the cpu usage up along with memory. A service restart cleared the cpu and memory usage. This happened to a camera that had a wrong password and I forgot what it was.
Upgraded to 1.37.55~20240304175439-jammy a few days ago and I lost a wifi bridge link to a remote building. This caused zmc to crash on the server. It did recovery with no intervention but system is still slower than normal. Going to only restart zoneminder.
[375079.292793] systemd-journal invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-250 [375079.292809] CPU: 4 PID: 690 Comm: systemd-journal Not tainted 5.15.0-100-generic #110-Ubuntu [375079.292817] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [375079.292822] Call Trace: [375079.292827] <TASK> [375079.292853] show_stack+0x52/0x5c [375079.292866] dump_stack_lvl+0x4a/0x63 [375079.292878] dump_stack+0x10/0x16 [375079.292886] dump_header+0x53/0x228 [375079.292895] oom_kill_process.cold+0xb/0x10 [375079.292903] out_of_memory+0x106/0x2e0 [375079.292915] __alloc_pages_slowpath.constprop.0+0x9b7/0xa80 [375079.292933] __alloc_pages+0x311/0x330 [375079.292944] alloc_pages+0x9e/0x1e0 [375079.292955] __page_cache_alloc+0x7e/0x90 [375079.292967] pagecache_get_page+0x152/0x590 [375079.292979] ? page_cache_ra_unbounded+0x166/0x210 [375079.292995] filemap_fault+0x488/0xab0 [375079.293006] ? filemap_map_pages+0x309/0x400 [375079.293021] __do_fault+0x39/0x120 [375079.293029] do_read_fault+0xeb/0x160 [375079.293036] do_fault+0xa0/0x2e0 [375079.293044] handle_pte_fault+0x1cd/0x240 [375079.293051] __handle_mm_fault+0x405/0x6f0 [375079.293063] handle_mm_fault+0xd8/0x2c0 [375079.293071] do_user_addr_fault+0x1c9/0x670 [375079.293083] exc_page_fault+0x77/0x170 [375079.293096] asm_exc_page_fault+0x27/0x30 [375079.293105] RIP: 0033:0x7f962dcee073 [375079.293117] Code: Unable to access opcode bytes at RIP 0x7f962dcee049. [375079.293121] RSP: 002b:00007ffcff03ede0 EFLAGS: 00010246 [375079.293127] RAX: 00007f962cbd5f80 RBX: 00007ffcff03efd0 RCX: 00007f962cbd5f80 [375079.293132] RDX: 00007f962db22000 RSI: 00007ffcff03f050 RDI: 00007f962cbd5fc0 [375079.293137] RBP: 00007ffcff03f030 R08: 00000000003ccf80 R09: 00007ffcff03ed78 [375079.293142] R10: d234f0d0f98e64b5 R11: 00007ffcff0a3090 R12: 36e7b9d410de54de [375079.293148] R13: 00007ffcff03f4a8 R14: 0000560024083a20 R15: 00007ffcff03ef08 [375079.293159] </TASK> [375079.293162] Mem-Info: [375079.293168] active_anon:948025 inactive_anon:6867956 isolated_anon:39 active_file:468 inactive_file:724 isolated_file:0 unevictable:175162 dirty:0 writeback:2 slab_reclaimable:30356 slab_unreclaimable:39177 mapped:177356 shmem:177880 pagetables:27339 bounce:0 kernel_misc_reclaimable:0 free:48983 free_pcp:1143 free_cma:0 [375079.293187] Node 0 active_anon:3792100kB inactive_anon:27471824kB active_file:1872kB inactive_file:2896kB unevictable:700648kB isolated(anon):156kB isolated(file):0kB mapped:709424kB dirty:0kB writeback:8kB shmem:711520kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:14064kB pagetables:109356kB all_unreclaimable? no [375079.293205] Node 0 DMA free:11264kB min:28kB low:40kB high:52kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [375079.293223] lowmem_reserve[]: 0 2887 31930 31930 31930 [375079.293236] Node 0 DMA32 free:121828kB min:6108kB low:9064kB high:12020kB reserved_highatomic:0KB active_anon:264376kB inactive_anon:2590648kB active_file:40kB inactive_file:32kB unevictable:168kB writepending:0kB present:3129152kB managed:3063616kB mlocked:168kB bounce:0kB free_pcp:768kB local_pcp:0kB free_cma:0kB [375079.293255] lowmem_reserve[]: 0 0 29042 29042 29042 [375079.293268] Node 0 Normal free:62840kB min:65536kB low:95272kB high:125008kB reserved_highatomic:2048KB active_anon:3527836kB inactive_anon:24881176kB active_file:1760kB inactive_file:2360kB unevictable:700480kB writepending:8kB present:30408704kB managed:29748740kB mlocked:700480kB bounce:0kB free_pcp:3804kB local_pcp:424kB free_cma:0kB [375079.293287] lowmem_reserve[]: 0 0 0 0 0 [375079.293298] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB [375079.293340] Node 0 DMA32: 318*4kB (UE) 388*8kB (UE) 1242*16kB (UME) 894*32kB (UME) 479*64kB (UME) 201*128kB (UME) 32*256kB (UME) 3*512kB (UM) 1*1024kB (M) 1*2048kB (M) 0*4096kB = 122040kB [375079.293394] Node 0 Normal: 383*4kB (UMEH) 887*8kB (UMEH) 3121*16kB (UMEH) 118*32kB (UMH) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62340kB [375079.293438] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [375079.293444] 557157 total pagecache pages [375079.293446] 377876 pages in swap cache [375079.293449] Swap cache stats: add 6669942, delete 6292375, find 20777361/21026840 [375079.293454] Free swap = 0kB [375079.293456] Total swap = 8388604kB [375079.293459] 8388461 pages RAM [375079.293461] 0 pages HighMem/MovableOnly [375079.293463] 181532 pages reserved [375079.293465] 0 pages hwpoisoned [375079.293512] Tasks state (memory values in pages): [375079.293515] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [375079.293532] [ 690] 0 690 26522 288 229376 143 -250 systemd-journal [375079.293543] [ 733] 0 733 6673 292 73728 413 -1000 systemd-udevd [375079.293552] [ 954] 116 954 2026 93 53248 15 0 rpcbind [375079.293562] [ 965] 104 965 22341 145 77824 70 0 systemd-timesyn [375079.293570] [ 967] 0 967 12787 279 81920 224 0 VGAuthService [375079.293579] [ 968] 0 968 78779 361 114688 93 0 vmtoolsd [375079.293587] [ 1050] 101 1050 4032 174 69632 84 0 systemd-network [375079.293596] [ 1052] 102 1052 6385 616 86016 475 0 systemd-resolve [375079.293604] [ 1065] 103 1065 2217 218 57344 19 -900 dbus-daemon [375079.293613] [ 1076] 0 1076 20713 79 61440 10 0 irqbalance [375079.293621] [ 1079] 0 1079 5265 1408 81920 1053 0 munin-node [375079.293631] [ 1082] 115 1082 2610 231 53248 141 -500 nrpe [375079.293639] [ 1083] 0 1083 8193 1139 102400 1046 0 networkd-dispat [375079.293648] [ 1086] 0 1086 58624 118 90112 56 0 polkitd [375079.293656] [ 1088] 107 1088 55601 398 73728 225 0 rsyslogd [375079.293665] [ 1089] 0 1089 308927 874 118784 183 0 canonical-livep [375079.293673] [ 1096] 0 1096 3758 163 69632 77 0 systemd-logind [375079.293682] [ 1101] 0 1101 98169 403 118784 150 0 udisksd [375079.293690] [ 1117] 0 1117 1544 22 49152 0 0 agetty [375079.293698] [ 1126] 0 1126 1724 34 49152 30 0 cron [375079.293707] [ 1140] 0 1140 3860 220 73728 160 -1000 sshd [375079.293715] [ 1145] 0 1145 27439 1059 110592 973 0 unattended-upgr [375079.293724] [ 1146] 0 1146 61057 333 110592 153 0 ModemManager [375079.293732] [ 1203] 114 1203 2372854 195404 3076096 103351 0 mariadbd [375079.293741] [ 1236] 0 1236 64938 1133 196608 1405 0 apache2 [375079.293749] [ 1641] 33 1641 9785 1030 118784 2262 0 zmdc.pl [375079.293757] [ 1708] 33 1708 406634 266903 2732032 2517 0 zmc [375079.293766] [ 1717] 33 1717 545797 406302 3891200 2176 0 zmc [375079.293774] [ 1721] 33 1721 343352 231769 2396160 7872 0 zmc [375079.293783] [ 1726] 33 1726 387263 207147 2768896 55862 0 zmc [375079.293791] [ 1736] 33 1736 260981 120383 1667072 2245 0 zmc [375079.293799] [ 1819] 0 1819 588714 2799 376832 307 -900 snapd [375079.293808] [ 1853] 33 1853 628054 429024 4624384 39827 0 zmc [375079.293816] [ 1932] 33 1932 495542 275863 3526656 78207 0 zmc [375079.293824] [ 1972] 33 1972 378954 188910 2715648 52916 0 zmc [375079.293832] [ 1978] 33 1978 249340 118349 1642496 6087 0 zmc [375079.293841] [ 2045] 33 2045 15039 1735 159744 6665 0 zmwatch.pl [375079.293849] [ 2058] 33 2058 13401 1403 151552 5294 0 zmupdate.pl [375079.293858] [ 2082] 33 2082 9597 339 110592 2754 0 zmstats.pl [375079.293867] [ 140888] 33 140888 16458 2249 172032 7509 0 /usr/bin/zmcont [375079.293876] [ 141601] 33 141601 16444 494 172032 9249 0 /usr/bin/zmcont [375079.293885] [1150043] 0 1150043 59905 206 102400 102 0 upowerd [375079.293893] [2640129] 33 2640129 175970 36110 1110016 9847 0 zmc [375079.293901] [2928828] 33 2928828 603461 477071 4464640 2283 0 zmc [375079.293911] [4112988] 0 4112988 73899 476 163840 270 0 packagekitd [375079.293923] [1782740] 33 1782740 938549 739990 6787072 7541 0 zmc [375079.293935] [2661550] 33 2661550 13683 2 143360 6914 0 zmtelemetry.pl [375079.293947] [2661994] 33 2661994 16855 1289 176128 9028 0 zmfilter.pl [375079.293957] [2662114] 33 2662114 17032 1942 176128 8525 0 zmfilter.pl [375079.293966] [2824620] 33 2824620 5222625 3404036 41349120 1659491 0 zmc [375079.293976] [4162822] 33 4162822 65292 3301 229376 1308 0 apache2 [375079.293985] [4162866] 33 4162866 65290 3319 229376 1275 0 apache2 [375079.293994] [4162995] 33 4162995 65292 3329 229376 1304 0 apache2 [375079.294002] [4163013] 33 4163013 65292 3356 229376 1276 0 apache2 [375079.294011] [4163022] 33 4163022 65292 3303 229376 1311 0 apache2 [375079.294019] [4163026] 33 4163026 65290 3191 229376 1387 0 apache2 [375079.294028] [4163034] 33 4163034 65292 2758 229376 1831 0 apache2 [375079.294037] [4163685] 33 4163685 65101 1407 196608 1336 0 apache2 [375079.294045] [4164137] 0 4164137 2588 171 57344 37 0 cron [375079.294053] [4164220] 117 4164220 723 23 45056 0 0 sh [375079.294061] [4164227] 117 4164227 723 23 40960 0 0 munin-cron [375079.294070] [4164438] 33 4164438 65290 3339 229376 1255 0 apache2 [375079.294078] [4164479] 33 4164479 65290 3332 229376 1256 0 apache2 [375079.294087] [4164919] 33 4164919 65099 563 167936 1365 0 apache2 [375079.294096] [4165038] 33 4165038 208643 84056 1114112 0 0 zmc [375079.294104] [4165120] 33 4165120 65290 3332 229376 1256 0 apache2 [375079.294113] [4165166] 33 4165166 65288 3327 229376 1255 0 apache2 [375079.294122] [4165171] 33 4165171 65099 565 167936 1364 0 apache2 [375079.294130] [4165182] 33 4165182 65292 3329 229376 1254 0 apache2 [375079.294139] [4165186] 33 4165186 65288 3145 229376 1256 0 apache2 [375079.294147] [4165195] 33 4165195 65290 3337 229376 1255 0 apache2 [375079.294156] [4165216] 33 4165216 65288 3336 229376 1255 0 apache2 [375079.294164] [4165217] 33 4165217 65099 565 167936 1364 0 apache2 [375079.294173] [4165229] 33 4165229 65101 567 167936 1364 0 apache2 [375079.294182] [4165232] 33 4165232 65288 3335 229376 1255 0 apache2 [375079.294190] [4165272] 33 4165272 65371 3381 229376 1251 0 apache2 [375079.294199] [4165284] 33 4165284 65288 3358 229376 1255 0 apache2 [375079.294207] [4165287] 33 4165287 65101 567 167936 1364 0 apache2 [375079.294215] [4165289] 33 4165289 65101 567 167936 1364 0 apache2 [375079.294224] [4165305] 33 4165305 372251 255230 2445312 0 0 zmc [375079.294232] [4165508] 33 4165508 65290 3339 229376 1255 0 apache2 [375079.294240] [4165512] 33 4165512 65290 3334 229376 1255 0 apache2 [375079.294249] [4165513] 33 4165513 65101 567 167936 1364 0 apache2 [375079.294257] [4165517] 33 4165517 65290 3337 229376 1255 0 apache2 [375079.294265] [4165541] 33 4165541 65101 567 167936 1364 0 apache2 [375079.294274] [4165569] 33 4165569 65288 3329 229376 1256 0 apache2 [375079.294282] [4165570] 33 4165570 65288 3336 229376 1255 0 apache2 [375079.294290] [4165573] 33 4165573 65099 561 167936 1365 0 apache2 [375079.294299] [4165595] 33 4165595 65290 3338 229376 1254 0 apache2 [375079.294307] [4165596] 33 4165596 65272 3097 229376 1261 0 apache2 [375079.294316] [4165602] 33 4165602 198816 92898 1118208 0 0 zmc [375079.294324] [4165613] 33 4165613 65288 3283 229376 1259 0 apache2 [375079.294332] [4165614] 33 4165614 65099 563 167936 1365 0 apache2 [375079.294340] [4165622] 33 4165622 65290 3338 229376 1255 0 apache2 [375079.294349] [4165627] 33 4165627 65290 3368 229376 1255 0 apache2 [375079.294357] [4165628] 33 4165628 65290 3335 229376 1255 0 apache2 [375079.294365] [4165727] 33 4165727 72433 2243 364544 0 0 nph-zms [375079.294374] [4165740] 33 4165740 72433 2246 352256 0 0 nph-zms [375079.294382] [4165780] 33 4165780 65099 563 167936 1364 0 apache2 [375079.294390] [4165796] 33 4165796 65288 3339 229376 1254 0 apache2 [375079.294398] [4165797] 33 4165797 65099 562 167936 1364 0 apache2 [375079.294407] [4165802] 33 4165802 65099 562 167936 1364 0 apache2 [375079.294415] [4165803] 33 4165803 65290 3338 229376 1254 0 apache2 [375079.294423] [4165812] 33 4165812 65292 3340 229376 1254 0 apache2 [375079.294432] [4165884] 33 4165884 3299 696 61440 0 0 zmdc.pl [375079.294440] [4165913] 33 4165913 78201 2630 376832 0 0 zms [375079.294448] [4165932] 33 4165932 88850 2244 368640 0 0 nph-zms [375079.294457] [4165939] 117 4165939 4083 1457 69632 0 0 munin-limits [375079.294465] [4165966] 33 4165966 70187 1627 307200 0 0 nph-zms [375079.294473] [4165979] 33 4165979 65099 562 167936 1364 0 apache2 [375079.294482] [4165980] 33 4165980 65284 3346 229376 1254 0 apache2 [375079.294490] [4166009] 33 4166009 65099 562 167936 1364 0 apache2 [375079.294499] [4166012] 33 4166012 65087 552 167936 1364 0 apache2 [375079.294507] [4166014] 33 4166014 65081 536 167936 1370 0 apache2 [375079.294515] [4166048] 0 4166048 2588 200 57344 8 0 cron [375079.294524] [4166049] 0 4166049 2551 195 57344 8 0 cron [375079.294532] [4166058] 33 4166058 70187 720 299008 0 0 nph-zms [375079.294540] [4166059] 33 4166059 58276 431 253952 0 0 nph-zms [375079.294548] [4166064] 33 4166064 70187 1852 311296 0 0 nph-zms [375079.294556] [4166072] 33 4166072 70187 702 286720 0 0 nph-zms [375079.294565] [4166073] 33 4166073 65081 536 167936 1370 0 apache2 [375079.294573] [4166074] 33 4166074 70187 720 286720 0 0 nph-zms [375079.294581] [4166079] 115 4166079 2643 250 53248 130 -500 nrpe [375079.294590] [4166080] 33 4166080 58442 434 258048 0 0 zms [375079.294601] [4166081] 33 4166081 70187 701 299008 0 0 nph-zms [375079.294611] [4166083] 33 4166083 723 24 40960 0 0 sh [375079.294621] [4166084] 33 4166084 49224 434 249856 0 0 zmu [375079.294632] [4166113] 33 4166113 723 23 40960 0 0 sh [375079.294642] [4166115] 33 4166115 49224 434 258048 0 0 zmu [375079.294654] [4166116] 33 4166116 723 22 40960 0 0 sh [375079.294664] [4166117] 33 4166117 45238 345 241664 0 0 zmu [375079.294675] [4166118] 33 4166118 58442 433 262144 0 0 nph-zms [375079.294685] [4166119] 33 4166119 49594 268 204800 0 0 nph-zms [375079.294693] [4166120] 33 4166120 58442 433 258048 0 0 nph-zms [375079.294701] [4166122] 33 4166122 58442 434 253952 0 0 nph-zms [375079.294709] [4166123] 33 4166123 58442 434 270336 0 0 nph-zms [375079.294718] [4166129] 33 4166129 58276 431 262144 0 0 nph-zms [375079.294726] [4166132] 33 4166132 49594 266 200704 0 0 nph-zms [375079.294734] [4166133] 33 4166133 46085 216 172032 0 0 nph-zms [375079.294743] [4166134] 33 4166134 58442 434 266240 0 0 nph-zms [375079.294751] [4166135] 33 4166135 46085 216 180224 0 0 nph-zms [375079.294765] [4166136] 33 4166136 47163 255 188416 0 0 nph-zms [375079.294775] [4166137] 33 4166137 41620 191 176128 0 0 nph-zms [375079.294785] [4166138] 33 4166138 39837 168 163840 0 0 nph-zms [375079.294795] [4166139] 33 4166139 45836 196 184320 0 0 nph-zms [375079.294803] [4166141] 33 4166141 46041 212 188416 0 0 nph-zms [375079.294811] [4166142] 33 4166142 49594 267 196608 0 0 nph-zms [375079.294820] [4166145] 33 4166145 46085 216 188416 0 0 nph-zms [375079.294828] [4166146] 33 4166146 48609 264 196608 0 0 nph-zms [375079.294836] [4166147] 33 4166147 49594 267 196608 0 0 nph-zms [375079.294844] [4166148] 33 4166148 46085 215 180224 0 0 nph-zms [375079.294853] [4166149] 33 4166149 25949 78 102400 0 0 nph-zms [375079.294861] [4166150] 33 4166150 49594 265 204800 0 0 nph-zms [375079.294869] [4166151] 33 4166151 34407 153 139264 0 0 nph-zms [375079.294877] [4166152] 33 4166152 25544 60 94208 0 0 nph-zms [375079.294886] [4166155] 115 4166155 2643 290 57344 104 -500 nrpe [375079.294894] [4166156] 115 4166156 2643 299 57344 98 -500 nrpe [375079.294903] [4166159] 33 4166159 25932 74 102400 0 0 nph-zms [375079.294911] [4166160] 33 4166160 23734 45 90112 0 0 nph-zms [375079.294919] [4166161] 33 4166161 25937 77 94208 0 0 nph-zms [375079.294928] [4166162] 33 4166162 25932 74 90112 0 0 nph-zms [375079.294936] [4166163] 33 4166163 25517 56 90112 0 0 nph-zms [375079.294945] [4166164] 33 4166164 25947 79 102400 0 0 nph-zms [375079.294953] [4166165] 33 4166165 19632 24 61440 0 0 nph-zms [375079.294961] [4166168] 33 4166168 23766 47 86016 0 0 nph-zms [375079.294969] [4166169] 0 4166169 64942 488 155648 1387 0 apache2 [375079.294978] [4166170] 33 4166170 65099 549 159744 1361 0 apache2 [375079.294986] [4166171] 0 4166171 64942 478 155648 1396 0 apache2 [375079.294994] [4166172] 0 4166172 64942 478 155648 1396 0 apache2 [375079.295003] [4166173] 0 4166173 64938 468 155648 1405 0 apache2 [375079.295011] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=systemd-journald.service,mems_allowed=0,global_oom,task_memcg=/system.slice/zoneminder.service,task=zmc,pid=2824620,uid=33 [375079.295071] Out of memory: Killed process 2824620 (zmc) total-vm:20890500kB, anon-rss:13594536kB, file-rss:0kB, shmem-rss:21608kB, UID:33 pgtables:40380kB oom_score_adj:0 [375084.273829] oom_reaper: reaped process 2824620 (zmc), now anon-rss:16kB, file-rss:0kB, shmem-rss:21608kB
I've been having the same issue on 1.36.33 running on CentOS9 and Fedora CoreOS 40.
The system is 4x 1080p15 cameras in to 8 monitors.
4x monitors are modect
and 4x monitors are record
. The record monitors are in camera_passthrough
with no decoding. The modect
monitors use vaapi
for decode and for encode.
System is a Core i7-8700T with 8gigs of RAM. Adding a swap files seems to have little effect (maybe give a few more hours before crash.) Adding RAM seems to have no effect either.
modect
monitors have the following settings:
I get debug level 5 from the _zmc processes. I also have the logs from the other processes running at debug level.
The full zmc log is 6gigs, I can send it out-of-band if needed.
Your 150 pre-frame count means you will be holding 153 images in memory. At 1080p thats a lot of ram. 8Gb is really not enough.
I increased the RAM to 16Gig and there is no discernable difference in performance. Zoneminder is still killed by oom-killer
after some period of time.
zoneminder_crash-20240510-1146.log
The system during normal operation seems not overly taxed.
The system ran steadily for 15 minutes, then jumped to 8 gigs at about 7 minutes in. Then jumped to 16gigs, pegged all the processors and was killed. This behavior happens on all the systems we have.
@connortechnology This problem seems to be getting worse again. I have 2 systems running current master branch and zmc seems to be crashing again. One of the systems required a reboot.
[4239078.371683] HangDetector invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [4239078.371703] CPU: 11 PID: 903 Comm: HangDetector Not tainted 5.15.0-101-generic #111-Ubuntu [4239078.371719] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [4239078.371728] Call Trace: [4239078.371734] <TASK> [4239078.371743] show_stack+0x52/0x5c [4239078.371767] dump_stack_lvl+0x4a/0x63 [4239078.371787] dump_stack+0x10/0x16 [4239078.371804] dump_header+0x53/0x228 [4239078.371823] oom_kill_process.cold+0xb/0x10 [4239078.371840] out_of_memory+0x106/0x2e0 [4239078.371861] __alloc_pages_slowpath.constprop.0+0x9b7/0xa80 [4239078.371894] __alloc_pages+0x311/0x330 [4239078.371917] alloc_pages+0x9e/0x1e0 [4239078.371937] __page_cache_alloc+0x7e/0x90 [4239078.371957] pagecache_get_page+0x152/0x590 [4239078.371976] ? page_cache_ra_unbounded+0x166/0x210 [4239078.372001] filemap_fault+0x488/0xab0 [4239078.372020] ? filemap_map_pages+0x309/0x400 [4239078.372045] __do_fault+0x39/0x120 [4239078.372060] do_read_fault+0xeb/0x160 [4239078.372075] do_fault+0xa0/0x2e0 [4239078.372090] handle_pte_fault+0x1cd/0x240 [4239078.372105] __handle_mm_fault+0x405/0x6f0 [4239078.372130] handle_mm_fault+0xd8/0x2c0 [4239078.372147] do_user_addr_fault+0x1c9/0x670 [4239078.372169] exc_page_fault+0x77/0x170 [4239078.372192] asm_exc_page_fault+0x27/0x30 [4239078.372207] RIP: 0033:0x7fd6e5061ce5 [4239078.372226] Code: Unable to access opcode bytes at RIP 0x7fd6e5061cbb. [4239078.372233] RSP: 002b:00007fd6e3f88d60 EFLAGS: 00010202 [4239078.372246] RAX: 0000000000000000 RBX: 000003dafca41b80 RCX: 0000000000000018 [4239078.372256] RDX: 000000007cd83bc4 RSI: 000000000040aee5 RDI: 0000000000000001 [4239078.372267] RBP: 0000562c03bc22a0 R08: 000000000040aee2 R09: 000000000040aee3 [4239078.372277] R10: 00007ffec47db080 R11: 00007ffec47db090 R12: 0000000000000005 [4239078.372287] R13: 0000000000000005 R14: 000000000000000e R15: 000003dafc94d940 [4239078.372309] </TASK> [4239078.372314] Mem-Info: [4239078.372322] active_anon:1736424 inactive_anon:6204074 isolated_anon:64 active_file:179 inactive_file:122 isolated_file:0 unevictable:124906 dirty:0 writeback:35 slab_reclaimable:17181 slab_unreclaimable:24266 mapped:120192 shmem:118204 pagetables:20217 bounce:0 kernel_misc_reclaimable:0 free:60214 free_pcp:748 free_cma:0 [4239078.372359] Node 0 active_anon:6945696kB inactive_anon:24816296kB active_file:716kB inactive_file:488kB unevictable:499624kB isolated(anon):256kB isolated(file):0kB mapped:480768kB dirty:0kB writeback:140kB shmem:472816kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 102400kB writeback_tmp:0kB kernel_stack:6304kB pagetables:80868kB all_unreclaimable? no [4239078.372397] Node 0 DMA free:11264kB min:28kB low:40kB high:52kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [4239078.372434] lowmem_reserve[]: 0 2887 31963 31963 31963 [4239078.372461] Node 0 DMA32 free:122404kB min:6100kB low:9056kB high:12012kB reserved_highatomic:0KB active_anon:3024kB inactive_anon:2918860kB active_file:4kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:3063680kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [4239078.372498] lowmem_reserve[]: 0 0 29076 29076 29076 [4239078.372524] Node 0 Normal free:107188kB min:61448kB low:91220kB high:120992kB reserved_highatomic:14336KB active_anon:6942848kB inactive_anon:21897308kB active_file:1264kB inactive_file:1476kB unevictable:499624kB writepending:140kB present:30408704kB managed:29782704kB mlocked:499624kB bounce:0kB free_pcp:2992kB local_pcp:248kB free_cma:0kB [4239078.372564] lowmem_reserve[]: 0 0 0 0 0 [4239078.372588] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB [4239078.372734] Node 0 DMA32: 242*4kB (UME) 244*8kB (UME) 220*16kB (UME) 314*32kB (UME) 167*64kB (UME) 122*128kB (UME) 83*256kB (UME) 40*512kB (UME) 19*1024kB (UME) 3*2048kB (UME) 3*4096kB (M) = 122408kB [4239078.372850] Node 0 Normal: 350*4kB (UE) 264*8kB (UME) 418*16kB (UME) 332*32kB (ME) 247*64kB (ME) 132*128kB (UME) 158*256kB (UME) 20*512kB (ME) 1*1024kB (U) 0*2048kB 0*4096kB = 105240kB [4239078.372958] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [4239078.372970] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [4239078.372981] 138044 total pagecache pages [4239078.372986] 17029 pages in swap cache [4239078.372991] Swap cache stats: add 326865545, delete 326848750, find 50058546/62805988 [4239078.373002] Free swap = 0kB [4239078.373007] Total swap = 4194300kB [4239078.373013] 8388478 pages RAM [4239078.373017] 0 pages HighMem/MovableOnly [4239078.373022] 173042 pages reserved [4239078.373027] 0 pages hwpoisoned [4239078.373032] Tasks state (memory values in pages): [4239078.373036] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [4239078.373051] [ 573] 0 573 88722 6775 114688 0 -1000 multipathd [4239078.373071] [ 577] 0 577 6736 755 77824 601 -1000 systemd-udevd [4239078.373090] [ 859] 104 859 22341 419 73728 197 0 systemd-timesyn [4239078.373108] [ 863] 0 863 12787 545 81920 499 0 VGAuthService [4239078.373125] [ 864] 0 864 78791 664 106496 255 0 vmtoolsd [4239078.373142] [ 966] 101 966 4032 252 65536 181 0 systemd-network [4239078.373174] [ 968] 102 968 6418 560 98304 926 0 systemd-resolve [4239078.373192] [ 980] 0 980 1724 542 53248 28 0 cron [4239078.373209] [ 981] 103 981 2265 939 57344 77 -900 dbus-daemon [4239078.373227] [ 988] 0 988 20713 503 61440 46 0 irqbalance [4239078.373244] [ 990] 0 990 8168 1097 94208 1808 0 networkd-dispat [4239078.373262] [ 992] 0 992 58624 290 90112 104 0 polkitd [4239078.373278] [ 994] 107 994 55601 668 77824 211 0 rsyslogd [4239078.373296] [ 1001] 0 1001 3758 225 65536 154 0 systemd-logind [4239078.373314] [ 1008] 0 1008 98172 344 126976 289 0 udisksd [4239078.373331] [ 1033] 0 1033 1544 211 45056 16 0 agetty [4239078.373347] [ 1063] 0 1063 3860 714 77824 309 -1000 sshd [4239078.373364] [ 1123] 0 1123 27438 1016 114688 1810 0 unattended-upgr [4239078.373381] [ 1124] 114 1124 1211868 34585 2072576 172841 0 mariadbd [4239078.373399] [ 1129] 0 1129 61058 215 114688 282 0 ModemManager [4239078.373416] [ 21475] 0 21475 59874 774 94208 153 0 upowerd [4239078.373433] [ 39532] 0 39532 73899 925 159744 506 0 packagekitd [4239078.373451] [ 848257] 0 848257 64976 971 180224 1365 0 apache2 [4239078.373468] [1616766] 0 1616766 40594 304 344064 158 -250 systemd-journal [4239078.373486] [1795614] 33 1795614 65061 788 155648 1395 0 apache2 [4239078.373503] [1795615] 33 1795615 65061 817 155648 1392 0 apache2 [4239078.373520] [1795616] 33 1795616 65061 786 155648 1399 0 apache2 [4239078.373538] [1795617] 33 1795617 65061 799 155648 1395 0 apache2 [4239078.373555] [1795618] 33 1795618 65061 808 155648 1395 0 apache2 [4239078.373579] [1829857] 0 1829857 514747 2021 327680 265 -900 snapd [4239078.373600] [1829951] 33 1829951 9789 920 118784 2371 0 zmdc.pl [4239078.373617] [1829981] 33 1829981 317523 161216 1982464 13275 0 zmc [4239078.373634] [1829985] 33 1829985 299857 161155 2002944 11445 0 zmc [4239078.373651] [1829989] 33 1829989 430101 290378 2936832 7096 0 zmc [4239078.373668] [1829993] 33 1829993 315740 167245 2048000 11088 0 zmc [4239078.373685] [1829997] 33 1829997 240220 99512 1265664 5596 0 zmc [4239078.373702] [1830011] 33 1830011 299321 159196 1966080 10358 0 zmc [4239078.373718] [1830016] 33 1830016 665361 440493 4423680 49405 0 zmc [4239078.373736] [1830021] 33 1830021 7338805 6412779 58318848 714144 0 zmc [4239078.373752] [1830026] 33 1830026 16106 116 163840 9359 0 /usr/bin/zmcont [4239078.373770] [1830036] 33 1830036 245337 78329 1318912 13361 0 zmc [4239078.373787] [1830045] 33 1830045 13452 1364 143360 5599 0 zmfilter.pl [4239078.373805] [1830068] 33 1830068 13584 754 151552 6321 0 zmfilter.pl [4239078.373823] [1830079] 33 1830079 9789 966 118784 2363 0 zmwatch.pl [4239078.373840] [1830088] 33 1830088 9595 360 118784 2779 0 zmstats.pl [4239078.373859] [1831271] 33 1831271 99147 25118 544768 0 0 zmc [4239078.373876] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=open-vm-tools.service,mems_allowed=0,global_oom,task_memcg=/system.slice/zoneminder.service,task=zmc,pid=1830021,uid=33 [4239078.373972] Out of memory: Killed process 1830021 (zmc) total-vm:29355220kB, anon-rss:25628280kB, file-rss:1228kB, shmem-rss:21608kB, UID:33 pgtables:56952kB oom_score_adj:0 [4239082.675202] oom_reaper: reaped process 1830021 (zmc), now anon-rss:16kB, file-rss:0kB, shmem-rss:21608kB [4270280.151517] clearcache (1843059): drop_caches: 1 [4270531.731483] Adding 4194300k swap on /swap.img. Priority:-2 extents:4 across:4587516k FS
@jrtaylor71 OP issue seems to have been resolved, going to close this issue out, but please create your own if you are still seeing the issue. Specifically please test with current master as we are looking to move for a new release and would like to ensure any memory issues are resolved.
For future readers, we found that when recording we clear out the packet queue less often, causing it to grow. The work to fix this went into 1.36.34 and master. It took a pretty significant rework to get right, but we think the issue is resolved.
Describe Your Environment ZM version : 1.36.33 Installed by isaac ppa Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-84-generic x86_64)
zmc process has a possible memory leak, start the zoneminder service and will eventually run out of memory, killing the process. The process will start over again and this will repeat.
Expected behavior To run without exploding.
Debug Logs