Closed transplier closed 3 years ago
Config file Let me know if you need the config. It's fairly gigantic.
Maybe the issue-template can contain a dropdown for gigiantic configs? Heres the markup-code sample:
<details><summary>Click here for Config-Details</summary>
\```
CONFIG information comes here
in full Detail
and gigantic rows
\```
</details>
Try it:
i have similar shutdowns of the frigate-container when my second m2.coral.tpu has some unknown error.
with not restart of frigate. it does not overcome the faulty second coral.tpu.
i see similar messages ValueError: Failed to load delegate from libedgetpu.so.1.0
but have not yet sorted them out.
Restarting the whole ubunut-maschine helps. maybe i find a better way.
would like to see frigate handling coral.tpu errors with stable restart or the core-framework or fallback to CPU-detection
It should exit the container and restart. For most users falling back to CPU would saturate all resources and break everything else running on the server.
It should exit the container and restart.
on my occasion it shuts down and does not restart. as i have more debug-information i will come back how to overcame that exit without restart
For most users falling back to CPU would saturate all resources and break everything else running on the server.
in case of error and exiting my second coral.tpu stays not available and frigate restarts, and shuts down immeditatly again. thats a race-condition that leads to misfunktion of whole frigate but could be overcome with a configurable failure-counter on that hardware-issue with a CPU-fallback. that would keep the system not going in shutdown-circle. at least in my case
For the moment I just have a script run docker exec frigate ps aux
every few minutes looking for zombie processes. If it finds any it does a docker restart frigate
. It hasn't happened since I filed the ticket (of course, sigh) but I'll try to get a stack trace of the stuck process next time.
Slightly different repro today. Something went wrong after a couple weeks uptime:
watchdog.doorbell INFO : No frames received from doorbell in 20 seconds. Exiting ffmpeg...
watchdog.doorbell INFO : Waiting for ffmpeg to exit gracefully...
watchdog.doorbell INFO : FFmpeg didnt exit. Force killing...
frigate.video INFO : doorbell: ffmpeg sent a broken frame. memoryview assignment: lvalue and rvalue have different structures
frigate.video INFO : doorbell: ffmpeg process is not running. exiting capture thread...
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55931e0d9140] moov atom not found
/tmp/cache/doorbell-20210927115124.mp4: Invalid data found when processing input
frigate.events INFO : bad file: doorbell-20210927115124.mp4
ffmpeg.doorbell.detect ERROR : [h264 @ 0x55ae1dff1980] Failed setup for format vaapi_vld: hwaccel initialisation returned error.
ffmpeg.doorbell.detect ERROR : More than 1000 frames duplicated
watchdog.doorbell INFO : No frames received from doorbell in 20 seconds. Exiting ffmpeg...
watchdog.doorbell INFO : Waiting for ffmpeg to exit gracefully...
frigate.video INFO : doorbell: ffmpeg sent a broken frame. memoryview assignment: lvalue and rvalue have different structures
frigate.video INFO : doorbell: ffmpeg process is not running. exiting capture thread...
ffmpeg.doorbell.detect ERROR : rtsp://admin:7GNKRkVaN4Ajql10@192.168.77.50/cam/realmonitor?channel=1&subtype=1: Immediate exit requested
Frigate didn't auto restart, had to do it manually. No zombies this time though.
The latest RC builds use a much more robust way to clean up processes and exit the container. If you aren't on a recent build, try upgrading.
Will do!
i have two corals, running fine until update to rc-5. now one is not detected, but its there. any hints do debug before a hard-reset? made a soft-restart of the whole ubuntu, but that does not helped.
my docker-compose has this in it:
devices:
- /dev/bus/usb:/dev/bus/usb
- /dev/dri/renderD128 # for intel hwaccel, needs to be updated for your hardware
- /dev/dri:/dev/dri # help hw decoding?
- /dev/apex_0:/dev/apex_0 # corel-board pci 1
- /dev/apex_1:/dev/apex_1 # corel-board pci 2
root@ub2004-fri5:/usr/src# lspci -nn | grep 089a
01:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
03:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
root@ub2004-fri5:/usr/src# ls -la /dev/ap*
crw-rw---- 1 root apex 120, 0 Sep 28 07:41 /dev/apex_0
crw-rw---- 1 root apex 120, 1 Sep 28 07:41 /dev/apex_1
outside the container on the host:
root@ub2004-fri5:/usr/src# apt show libedgetpu1-max
Package: libedgetpu1-max
Version: 16.0
Priority: optional
Section: misc
Source: libedgetpu
Maintainer: Coral <coral-support@google.com>
Installed-Size: 1210 kB
Provides: libedgetpu1 (= 16.0)
Depends: libc6, libgcc1, libstdc++6, libusb-1.0-0, debconf (>= 0.5) | debconf-2.0
Conflicts: libedgetpu1, libedgetpu1-legacy
Homepage: https://coral.ai/
Download-Size: 388 kB
APT-Sources: https://packages.cloud.google.com/apt coral-edgetpu-stable/main amd64 Packages
Description: Support library for Edge TPU
Support library (max speed) for the Edge TPU
the log shows, that only one Coral is detected by frigate:
frigate | [2021-09-28 08:39:59] frigate.app INFO : Capture process started for 6-einfahrt-134: 296
frigate | [2021-09-28 08:39:59] frigate.app INFO : Capture process started for 7-hof-108: 303
frigate | [2021-09-28 08:39:59] ws4py INFO : Using epoll
frigate | [2021-09-28 08:39:59] detector.coral-2 INFO : Starting detection process: 219
frigate | [2021-09-28 08:39:59] frigate.edgetpu INFO : Attempting to load TPU as pci:1
frigate | [2021-09-28 08:40:12] frigate.edgetpu ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
frigate | Process detector:coral-2:
frigate | Traceback (most recent call last):
frigate | File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
frigate | delegate = Delegate(library, options)
frigate | File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
frigate | raise ValueError(capture.message)
frigate | ValueError
frigate |
frigate | During handling of the above exception, another exception occurred:
frigate |
frigate | Traceback (most recent call last):
frigate | File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
frigate | self.run()
frigate | File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
frigate | self._target(*self._args, **self._kwargs)
frigate | File "/opt/frigate/frigate/edgetpu.py", line 156, in run_detector
frigate | object_detector = LocalObjectDetector(
frigate | File "/opt/frigate/frigate/edgetpu.py", line 64, in __init__
frigate | edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
frigate | File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
frigate | raise ValueError('Failed to load delegate from {}\n{}'.format(
frigate | ValueError: Failed to load delegate from libedgetpu.so.1.0
frigate |
frigate | [2021-09-28 08:40:14] peewee.sqliteq DEBUG : received query INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)
frigate | [2021-09-28 08:40:14] peewee DEBUG : ('INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)', ['1632811201.0-iva82h', 'haustuer-228', '/media/frigate/recordings/2021-09/28/08/haustuer-228/40.01.mp4', 1632811201.0, 1632811209.24, 8.24])
frigate | [2021-09-28 08:40:14] peewee.sqliteq DEBUG : received query INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)
frigate | [2021-09-28 08:40:14] peewee DEBUG : ('INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)', ['1632811201.0-mwc1mz', 'cat-kueche-130', '/media/frigate/recordings/2021-09/28/08/cat-kueche-130/40.01.mp4', 1632811201.0, 1632811211.015, 10.015])
frigate | [2021-09-28 08:40:15] peewee.sqliteq DEBUG : received query INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)
frigate | [2021-09-28 08:40:15] peewee DEBUG : ('INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)', ['1632811201.0-bgjioz', '6-einfahrt-134', '/media/frigate/recordings/2021-09/28/08/6-einfahrt-134/40.01.mp4', 1632811201.0, 1632811210.967, 9.967])
frigate | [2021-09-28 08:40:15] peewee.sqliteq DEBUG : received query INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)
frigate | [2021-09-28 08:40:15] peewee DEBUG : ('INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)', ['1632811201.0-k8rc1c', 'haustuer-81', '/media/frigate/recordings/2021-09/28/08/haustuer-81/40.01.mp4', 1632811201.0, 1632811209.013, 8.013])
frigate | [2021-09-28 08:40:15] peewee.sqliteq DEBUG : received query INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)
frigate | [2021-09-28 08:40:15] peewee DEBUG : ('INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)', ['1632811200.0-wul75h', 'cat-hilook-132', '/media/frigate/recordings/2021-09/28/08/cat-hilook-132/40.00.mp4', 1632811200.0, 1632811209.954, 9.954])
frigate | [2021-09-28 08:40:15] peewee.sqliteq DEBUG : received query INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)
frigate | [2021-09-28 08:40:15] peewee DEBUG : ('INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)', ['1632811200.0-hnzz9z', '5-strasse-117', '/media/frigate/recordings/2021-09/28/08/5-strasse-117/40.00.mp4', 1632811200.0, 1632811210.004, 10.004])
frigate | [2021-09-28 08:40:15] peewee.sqliteq DEBUG : received query INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)
frigate | [2021-09-28 08:40:15] peewee DEBUG : ('INSERT INTO "recordings" ("id", "camera", "path", "start_time", "end_time", "duration") VALUES (?, ?, ?, ?, ?, ?)', ['1632811200.0-sdmy28', '4-strasse-119', '/media/frigate/recordings/2021-09/28/08/4-strasse-119/40.00.mp4', 1632811200.0, 1632811207.96, 7.96])
frigate | [2021-09-28 08:40:19] frigate.watchdog INFO : Detection appears to have stopped. Exiting frigate...
frigate | [cont-finish.d] executing container finish scripts...
frigate | [cont-finish.d] done.
frigate | [s6-finish] waiting for services.
frigate | [s6-finish] sending all processes the TERM signal.
frigate | [s6-finish] sending all processes the KILL signal and exiting.
frigate exited with code 0
i got the second pci-coral back, without hard-reset, this hint helped. did this only with software on command-line on the ongoing powered system.
doing this:
lspci -nn | grep 089a
echo "1" > /sys/bus/pci/devices/0000\:03\:00.0/remove
echo "1" > /sys/bus/pci/rescan
dmesg
[ 9090.099545] pci 0000:03:00.0: [1ac1:089a] type 00 class 0x0000ff
[ 9090.099623] pci 0000:03:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit pref]
[ 9090.099654] pci 0000:03:00.0: reg 0x18: [mem 0x00000000-0x000fffff 64bit pref]
[ 9090.099989] pci 0000:03:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x1 link at 0000:00:1c.1 (capable of 4.000 Gb/s with 5 GT/s x1 link)
[ 9090.126098] pci 0000:03:00.0: BAR 2: assigned [mem 0xf0000000-0xf00fffff 64bit pref]
[ 9090.126136] pci 0000:03:00.0: BAR 0: assigned [mem 0xf0100000-0xf0103fff 64bit pref]
maybe that could be a fallback for frigate itself after loosing hardware? maybe with a configurable success/failure counter, witch switches after 10 times fallback to CPU?
and not shutting all down forever after this error-circle?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Describe the bug I'm affected by https://github.com/blakeblackshear/frigate/discussions/1194 - the first time frigate starts on my machine after a cold boot, the USB TPU isn't found. In this situation frigate tries to shut itself down, but it ends up hanging forever. I need to manually
docker restart frigate
to fix it. There appear to be several zombie processes:It appears process 38 (
frigate.process:cave
) is the culprit. It's in interruptible sleep but doesn't react to SIGTERM. I have to SIGKILL it, whereupon it terminates and the docker container restarts properly.Version of frigate 0.8.4-5043040
Config file Let me know if you need the config. It's fairly gigantic.
Frigate container logs
Frigate stats
FFprobe from your camera
Run the following command and paste output below
Computer Hardware
Camera Info:
Additional context Add any other context about the problem here.