ZoneMinder / zoneminder

ZoneMinder is a free, open source Closed-circuit television software application developed for Linux which supports IP, USB and Analog cameras.
http://www.zoneminder.com/
GNU General Public License v2.0
4.84k stars 1.19k forks source link

Capture daemon being restarted by zmwatch #3271

Closed ASoTNetworks closed 2 years ago

ASoTNetworks commented 3 years ago

Describe Your Environment

Describe the bug Ever since updating to v1.36.0 and above I see that zmwatch is restarting capture daemon randomly.

To Reproduce Steps to reproduce the behavior:

  1. Have monitor mode in any modes such as mocord, monitor, modect and so on.
  2. You will message in the log like below.

Expected behavior Capture daemon should not stop.

Debug Logs

2021-06-01 01:04:57 zmc_m18     8299    INF Starting Capture version 1.36.3 zmc.cpp 220
2021-06-01 01:04:57 zmdc        8299    INF 'zmc -m 18' started at 21/06/01 01:04:57    zmdc.pl -
2021-06-01 01:04:57 zmdc        26197   INF 'zmc -m 18' starting at 21/06/01 01:04:57, pid = 8299   zmdc.pl -
2021-06-01 01:04:57 zmdc        26197   INF Starting pending process, zmc -m 18 zmdc.pl -
2021-06-01 01:04:56 zmdc        26197   INF 'zmc -m 18' exited normally zmdc.pl -
2021-06-01 01:04:56 zmc_m18     6868    INF Camera01: image_count:51499 - Closing event 622849, shutting down   zm_monitor.cpp  3107
2021-06-01 01:04:55 zmdc        26197   INF 'zmc -m 18' sending stop to pid 6868 at 21/06/01 01:04:55   zmdc.pl -
2021-06-01 01:04:55 zmwatch     26275   WAR Restarting capture daemon for Camera01, time since last capture 7 seconds (1622523895-1622523888)-
connortechnology commented 2 years ago

The problem is that capture has halted for 7 seconds. Why? You can increase this limit by changing Options->System->WATCH_MAX_DELAY I find that a value of 45 is more practical.

ASoTNetworks commented 2 years ago

It was set to 7 seconds default, I made it higher to 45 seconds now but I notice the streams stops on its own with black screen saying Unable to stream then the watchdog kicked in. I see it on my other monitors and they are different camera make and models too.

Using version 1.34.x before the upgrade didn't have this problem.

2021-06-08 19:28:53 zms_m13     22036   ERR Unable to send stream text frame: Broken pipe   zm_stream.cpp   285
2021-06-08 19:28:53 zms_m13     22036   ERR Unable to connect to monitor id 13 for streaming    zm_stream.cpp   58
2021-06-08 19:28:53 zms_m13     22036   ERR Can't open memory map file /dev/shm/zm.mmap.13: No such file or directory   zm_monitor.cpp  916
2021-06-08 19:28:53 zms_m13     22036   ERR Monitor shm is not connected    zm_stream.cpp   76
2021-06-08 19:28:52 zms_m13     22036   ERR Unable to connect to monitor id 13 for streaming    zm_stream.cpp   58
2021-06-08 19:28:52 zms_m13     22036   ERR Can't open memory map file /dev/shm/zm.mmap.13: No such file or directory   zm_monitor.cpp  916
2021-06-08 19:28:52 zms_m13     22036   ERR Monitor shm is not connected    zm_stream.cpp   76
2021-06-08 19:28:52 zmdc        14799   INF 'zmc -m 13' exited normally zmdc.pl -
2021-06-08 19:28:51 zmdc        14799   INF 'zmc -m 13' sending stop to pid 18534 at 21/06/08 19:28:51  zmdc.pl -
2021-06-08 19:28:51 zms_m13     22036   ERR Monitor shm is not connected    zm_stream.cpp   76
2021-06-08 19:28:51 zmwatch     14883   WAR Restarting capture daemon for Camera13, time since last capture 55 seconds (1623194931-1623194876)  zmwatch.pl  -
2021-06-08 19:28:50 zms_m13     22036   ERR Monitor shm is not connected    zm_stream.cpp   76
2021-06-08 19:28:49 zms_m13     22036   ERR Monitor shm is not connected    zm_stream.cpp   76
2021-06-08 19:28:48 zms_m13     22027   ERR Monitor shm is not connected    zm_stream.cpp   76
2021-06-08 19:28:47 zms_m13     22012   ERR Monitor shm is not connected    zm_stream.cpp   76
2021-06-08 19:28:44 zms_m13     21992   ERR Monitor shm is not connected    zm_stream.cpp   76
connortechnology commented 2 years ago

There are a lot of potential reasons. We are going to need debug level 3 logs to figure it out. Our design has been to prioritize capturing, removing possibility of blocking or waiting but there are still places where it can hang. For example: writing to database.

ASoTNetworks commented 2 years ago

Here is the log with level 3 set

Jun 10 02:17:25 cctv zmc_m18[18862]: INF [zmc_m18] [Camera01: 51000 - Capturing at 15.00 fps, capturing bandwidth 360276bytes/sec]
Jun 10 02:17:25 cctv zmc_m18[18862]: INF [zmc_m18] [Camera01: 51000 - Analysing at 15.00 fps from 50999 - 49999=1000 / 1623305845.526966 - 1623305778.867344 = 66.659622]
Jun 10 02:18:54 cctv zmwatch[16016]: WAR [Restarting capture daemon for Camera01, time since last capture 51 seconds (1623305934-1623305883)]
Jun 10 02:18:54 cctv zmdc[15935]: INF ['zmc -m 18' sending stop to pid 18862 at 21/06/10 02:18:54]
Jun 10 02:18:54 cctv zmc_m18[18862]: INF [zmc_m18] [Camera01: image_count:51568 - Closing event 629879, shutting down]
Jun 10 02:18:54 cctv zmc_m18[18862]: INF [zmc_m18] [Estimating the duration of the last packet in a fragment, consider setting the duration field in AVPacket instead.]
Jun 10 02:18:54 cctv zmdc[15935]: INF ['zmc -m 18' exited normally]
Jun 10 02:18:55 cctv zmdc[15935]: INF [Starting pending process, zmc -m 18]
Jun 10 02:18:56 cctv zmdc[15935]: INF ['zmc -m 18' starting at 21/06/10 02:18:56, pid = 21416]
Jun 10 02:18:56 cctv zmdc[21416]: INF ['zmc -m 18' started at 21/06/10 02:18:56]
Jun 10 02:18:56 cctv zmc_m18[21416]: INF [zmc_m18] [Enabling ffmpeg logs, as LOG_DEBUG+LOG_FFMPEG are enabled in options]
Jun 10 02:18:56 cctv zmc_m18[21416]: INF [zmc_m18] [Starting Capture version 1.36.4]
Jun 10 02:18:59 cctv zmc_m18[21416]: INF [zmc_m18] [deprecated pixel format used, make sure you did set range correctly]
Jun 10 02:18:59 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 10 - Capturing at 3.69 fps, capturing bandwidth 97080bytes/sec]
Jun 10 02:18:59 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 000 - Opened new event 629880, section start]
Jun 10 02:18:59 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 10 - Analysing at 2.95 fps from 9 - 0=9 / 1623305939.551569 - 1623305936.497256 = 3.054313]
Jun 10 02:19:00 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 50 - Capturing at 33.33 fps, capturing bandwidth 741127bytes/sec]
Jun 10 02:19:00 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 50 - Analysing at 38.40 fps from 49 - 9=40 / 1623305940.593123 - 1623305939.551569 = 1.041554]
Jun 10 02:19:01 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 70 - Capturing at 14.93 fps, capturing bandwidth 373556bytes/sec]
Jun 10 02:19:01 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 70 - Analysing at 14.71 fps from 69 - 49=20 / 1623305941.952865 - 1623305940.593123 = 1.359742]
Jun 10 02:19:03 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 90 - Capturing at 15.15 fps, capturing bandwidth 300671bytes/sec]
Jun 10 02:19:03 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 90 - Analysing at 15.35 fps from 89 - 69=20 / 1623305943.255503 - 1623305941.952865 = 1.302638]
Jun 10 02:19:04 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 110 - Capturing at 14.92 fps, capturing bandwidth 371428bytes/sec]
Jun 10 02:19:04 cctv zmc_m18[21416]: INF [zmc_m18] [Camera01: 110 - Analysing at 14.94 fps from 109 - 89=20 / 1623305944.594352 - 1623305943.255503 = 1.338849]
connortechnology commented 2 years ago

You have NOT set debug level to 3. If you had, there would be thousands of lines of debug logs in there.

ASoTNetworks commented 2 years ago

Weird I have turned on the options in logging. image

SteveGilvarry commented 2 years ago

Also need a target check the help ? On that option but I think you need _zmc_m18 for you monitor 18 that seems to be having the issue. Will generate a separate file in logs/zm.

ASoTNetworks commented 2 years ago

Thanks, I got the debug log going now and this is what I see before and after where zmwatch restart the hanged stream. Before it happen I see an event lock message at 14:54:59.159466

Debug log here

connortechnology commented 2 years ago

please update to 1.36.4

ASoTNetworks commented 2 years ago

Sorry I didn't update the version above but I am already on 1.36.4

connortechnology commented 2 years ago

So there is a slight problem with the locking, but what is happening to the capture thread? It appears to never capture another frame, which means it can't signal the decode thread to wake up and notice that it needs to decode a packet. The capture thread specifically does not lock for this very reason. Do you maybe have a value in your maxFps or AlarmMaxFps fields?

connortechnology commented 2 years ago

The only other thing I can think of is contention on the Monitor_Status table. mysql queries timeout after like 200 seconds.. so that could be it.

ASoTNetworks commented 2 years ago

I have maxFps set to 40.000 and no AlarmMaxFps set. I think I have it set a long time ago on versions that was saving jpegs. Should I remove it?

connortechnology commented 2 years ago

Definitely remove it. I'm not convinced it's the culprit, but it will starve ffmpeg for cpu.

ASoTNetworks commented 2 years ago

I see an improvement and doesn't happen too many times now (was a few times every few hours) after removing the maxFps it only happens a couple times a day.

mgeisselreiter commented 2 years ago

I can comfirm this problem. After upgrading from 1.36.3 to 1.36.4 zmwatch log shows restarting capture daemon every 40 to 60 seconds. Now I cannot use Live watch - I see video just 2-3 seconds and than picture stops, after a while there is black screen and it repeats every minute. The same problem is with recording - there are no records, only 2-3 seconds videos every minute. Now I upgraded to 1.36.5 but the problem remains. There must be some bad change between versions 1.36.3 and later. It is frustrating because I need to watch daily records every day which are now missing. I use CentOS 7 and there is no simple way to roll back to version 1.36.3. Any help will be appreciate.

mgeisselreiter commented 2 years ago

After a long playing with changing configuration parameters I manage to find some solution to solve this issue. I changed storage parameters for my camera for Video Writer from Camera Passthrough to Encode, OutputCodec to Auto, Encoder Auto, Optional Encoder Parameters crf=23 and Montage - Live video is fluent without breaks. Tomorrow I will see how recording works. Still wonder why Camera Passthrough stopped working since version 1.36.4. Any ideas?

joe7rocks commented 2 years ago

I'm having the very same issue with various camera models after the update, changing to Encode from Passthrough has also 'fixed' it temporarily

rmk92 commented 2 years ago

Last night, I upgraded from 1.34.9 to 1.36.5 and experienced this exact issue. With two Hikvision cameras with two streams - one stream 1080p @ 12.5fps configured for Nodect and the other at a lower resolution and fps for Modect, I see both 1080p capture daemons repeatedly blocking on a futex, beiung sent a SIGTERM by zmwatch, and then a SIGKILL every minute.

I have passthrough enabled, which I can't reasonably disable for the platform zoneminder is running on (aarch64 with no GPU acceleration).

I ended up downgrading to 1.36.3 which solved that issue, but even this version is less than perfect. During an alarm lasting more than a few frames, capture completely stops for a number of seconds. Not only is the saved MP4 obviously corrupted due to missing frames, the written jpeg frames omit about 4 seconds worth. This was never a problem with 1.34.9 which worked flawlessly in regard of these issues.

mgeisselreiter commented 2 years ago

After upgrade from 1.36.5 to 1.36.8 the problem with Camera Passthrough was not solved. With this setting zm crashes and restarts every minute as I mentioned on July 1. I have to use Encode and the x264 codec which is more CPU consumpting.

SteveGilvarry commented 2 years ago

Are you able to get debug logs for zmc? That might help find what is causing it.

On Sun, 17 Oct 2021 at 7:12 pm, mgeisselreiter @.***> wrote:

After upgrade from 1.36.5 to 1.36.8 the problem with Camera Passthrough was not solved. With this setting zm crashes and restarts every minute as I mentioned on July 1. I have to use Encode and the x264 codec which is more CPU consumpting.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/ZoneMinder/zoneminder/issues/3271#issuecomment-945070632, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2CWXFMZRILF6ILVKETVIDUHKAPDANCNFSM4534WTBA .

mgeisselreiter commented 2 years ago

Yes. Options - Logging - Debug - Level 3

If you need more logs or higher level please let me know or insctruct me.

SteveGilvarry commented 2 years ago

I removed the file, you may want to replace the monitor password in the debug log file. But that example never connected to the camera, which doesn't match what you described since encode is working you must be OK with that normally, so not sure if that was a different test. The log just returned No Route to Host for all the attempts to connect.

mgeisselreiter commented 2 years ago

Hi Steve, thanks for your help. I have configured 2 cameras but using and testing only one with IP 10.1.0.36. So you are right that the camera with IP 10.1.0.17 is not connected. That is probably confusing. So debug logs should be OK. Do you need higher level debug logs?

SteveGilvarry commented 2 years ago

Logs only had Monitor 2, which was 10.1.0.17, so nothing useful, use LOG_DEBUG_TARGET in Options-Logging to set for the other monitor. _zmc_m1 probably, or check 10.1.0.36 Monitor ID and change 1 to correct number. Will check tomorrow for updated logs.

connortechnology commented 2 years ago

Another note: choosing defaults for Encode is going to use a lot of cpu. It is likely encoding to h265 unless you specify h264. You likely want to specify a preset= option. For example preset=ultrafast is basically mjpeg in an mp4 container. Each encoder has different settings though so you need to google it.

mgeisselreiter commented 2 years ago

My fault. I zipped only one log file by mistake. Please find all log files which includes first camera too. I do not know how to disable second camera. I do not want do delete it because I want to use it occasionally. @connortechnology I have already changed from default (which was h265 and using cpu for almost 100%) to h264 zm.zip .

connortechnology commented 2 years ago

I would suggest trying master. I am trying something there that fixes this problem.

Or: you need to set MaxImageBUffer count > 2*keyframe interval + 1. What is happening is that the queue cleanup process hit a locked packet before it could reach the next keyframe so it couldn't clear them out. Your keyframe interval appears to be 50. So try increasing MaxImage buffer to 100 or 101.

mgeisselreiter commented 2 years ago

Hi, I changed MaxImageBuffer count from 60 to 101, changed Encode to Passthrough and the problem is solved. Thank you guys.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.