Closed narayanvs closed 2 years ago
Is the Status: SHUTDOWN
normal while it was operating as expected? If in the unraid termal you run lspci
what shows up?
I tried the following.
commented out the following lines in the config.yml
#detectors:
# coral_pci:
# type: edgetpu
# device: pci
Restarted the server. Coral driver showed the below
un-commentated the below lines and restarted frigate
detectors:
coral_pci:
type: edgetpu
device: pci
Now coral drivers shows as below
and the figate logs as below.
[2022-05-31 23:27:56] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:56626]
[2022-05-31 23:27:56] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:56632]
[2022-05-31 23:27:57] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:56632]
[2022-05-31 23:27:57] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:56634]
[2022-05-31 23:29:23] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:56634]
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[2022-05-31 23:29:23] frigate.video ERROR : patio: Unable to read frames from ffmpeg process.
[2022-05-31 23:29:23] frigate.video ERROR : patio: ffmpeg process is not running. exiting capture thread...
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.
[2022-05-31 23:29:27] frigate.app INFO : Starting Frigate (0.10.1-83481af)
Starting migrations
[2022-05-31 23:29:27] peewee_migrate INFO : Starting migrations
There is nothing to migrate
[2022-05-31 23:29:27] peewee_migrate INFO : There is nothing to migrate
[2022-05-31 23:29:27] detector.coral_pci INFO : Starting detection process: 241
[2022-05-31 23:29:27] frigate.edgetpu INFO : Attempting to load TPU as pci
[2022-05-31 23:29:27] frigate.app INFO : Output process started: 242
[2022-05-31 23:29:27] ws4py INFO : Using epoll
[2022-05-31 23:29:27] frigate.edgetpu INFO : TPU found
[2022-05-31 23:29:27] frigate.app INFO : Camera processor started for front: 247
[2022-05-31 23:29:27] frigate.app INFO : Camera processor started for parking: 248
[2022-05-31 23:29:27] frigate.app INFO : Camera processor started for patio: 250
[2022-05-31 23:29:27] frigate.app INFO : Camera processor started for garden: 254
[2022-05-31 23:29:27] frigate.app INFO : Capture process started for front: 255
[2022-05-31 23:29:27] frigate.app INFO : Capture process started for parking: 259
[2022-05-31 23:29:27] frigate.app INFO : Capture process started for patio: 260
[2022-05-31 23:29:27] frigate.app INFO : Capture process started for garden: 265
[2022-05-31 23:29:27] ws4py INFO : Using epoll
[2022-05-31 23:29:32] frigate.record WARNING : Discarding a corrupt recording segment: patio-20220531232929.mp4
[2022-05-31 23:29:32] frigate.record WARNING : Discarding a corrupt recording segment: patio-20220531232929.mp4
[2022-05-31 23:29:33] frigate.record WARNING : Discarding a corrupt recording segment: patio-20220531232929.mp4
[2022-05-31 23:29:33] frigate.record WARNING : Discarding a corrupt recording segment: patio-20220531232929.mp4
[2022-05-31 23:29:47] frigate.watchdog INFO : Detection appears to be stuck. Restarting detection process...
[2022-05-31 23:29:47] root INFO : Waiting for detection process to exit gracefully...
[2022-05-31 23:30:17] root INFO : Detection process didnt exit. Force killing...
[2022-05-31 23:30:30] detector.coral_pci INFO : Starting detection process: 561
[2022-05-31 23:30:30] frigate.edgetpu INFO : Attempting to load TPU as pci
Process detector:coral_pci:
[2022-05-31 23:30:43] frigate.edgetpu ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
delegate = Delegate(library, options)
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
raise ValueError(capture.message)
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/frigate/frigate/edgetpu.py", line 136, in run_detector
object_detector = LocalObjectDetector(
File "/opt/frigate/frigate/edgetpu.py", line 44, in __init__
edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
raise ValueError('Failed to load delegate from {}\n{}'.format(
ValueError: Failed to load delegate from libedgetpu.so.1.0
[2022-05-31 23:30:50] frigate.watchdog INFO : Detection appears to have stopped. Exiting frigate...
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[2022-05-31 23:30:50] frigate.video ERROR : patio: Unable to read frames from ffmpeg process.
[2022-05-31 23:30:50] frigate.video ERROR : patio: ffmpeg process is not running. exiting capture thread...
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.
** Press ANY KEY to close this window **
and this is what the lspcie shows
00:00.0 Host bridge: Intel Corporation Device 4c43 (rev 01)
00:02.0 VGA compatible controller: Intel Corporation RocketLake-S GT1 [UHD Graphics 750] (rev 04)
00:06.0 PCI bridge: Intel Corporation Device 4c09 (rev 01)
00:08.0 System peripheral: Intel Corporation Device 4c11 (rev 01)
00:14.0 USB controller: Intel Corporation Tiger Lake-H USB 3.2 Gen 2x1 xHCI Host Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Tiger Lake-H Shared SRAM (rev 11)
00:16.0 Communication controller: Intel Corporation Tiger Lake-H Management Engine Interface (rev 11)
00:17.0 SATA controller: Intel Corporation Device 43d2 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Device 43b8 (rev 11)
00:1c.4 PCI bridge: Intel Corporation Tiger Lake-H PCI Express Root Port #5 (rev 11)
00:1c.5 PCI bridge: Intel Corporation Device 43bd (rev 11)
00:1c.7 PCI bridge: Intel Corporation Device 43bf (rev 11)
00:1f.0 ISA bridge: Intel Corporation Device 4385 (rev 11)
00:1f.3 Audio device: Intel Corporation Tiger Lake-H HD Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Tiger Lake-H SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Tiger Lake-H SPI Controller (rev 11)
01:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN570 NVMe SSD
03:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-V (rev 03)
04:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
Interesting, so it is almost like Frigate trying to load it is having it turn to SHUTDOWN which seems odd.
Have you tried restarting UnRaid?
yes i restarted few times and the result is same. only time it stays up when I comment out the detectors part
There are others seeing similar in https://github.com/blakeblackshear/frigate/issues/3116 and some have solved it so would be worth looking in there.
Potentially something in here may help (disable power saving in driver?), but overall I am out of my element at this point and @blakeblackshear can help further with this.
i will check those. meanwhile i tried this
it stays up for a while after frigate restart, but after 2-3 mins it crashes again
2022-05-31 23:56:10] peewee_migrate INFO : Starting migrations
There is nothing to migrate
[2022-05-31 23:56:10] peewee_migrate INFO : There is nothing to migrate
[2022-05-31 23:56:10] detector.coral_pci INFO : Starting detection process: 241
[2022-05-31 23:56:10] frigate.app INFO : Output process started: 242
[2022-05-31 23:56:10] frigate.app INFO : Camera processor started for front: 248
[2022-05-31 23:56:10] frigate.edgetpu INFO : Attempting to load TPU as pci
[2022-05-31 23:56:10] ws4py INFO : Using epoll
[2022-05-31 23:56:10] frigate.app INFO : Camera processor started for parking: 250
[2022-05-31 23:56:10] frigate.edgetpu INFO : TPU found
[2022-05-31 23:56:10] frigate.app INFO : Camera processor started for patio: 252
[2022-05-31 23:56:10] frigate.app INFO : Camera processor started for garden: 253
[2022-05-31 23:56:10] frigate.app INFO : Capture process started for front: 255
[2022-05-31 23:56:10] frigate.app INFO : Capture process started for parking: 258
[2022-05-31 23:56:10] frigate.app INFO : Capture process started for patio: 263
[2022-05-31 23:56:10] frigate.app INFO : Capture process started for garden: 278
[2022-05-31 23:56:10] ws4py INFO : Using epoll
[2022-05-31 23:56:15] frigate.record WARNING : Discarding a corrupt recording segment: patio-20220531235611.mp4
[2022-05-31 23:56:15] frigate.record WARNING : Discarding a corrupt recording segment: patio-20220531235611.mp4
[2022-05-31 23:56:15] frigate.record WARNING : Discarding a corrupt recording segment: patio-20220531235611.mp4
[2022-05-31 23:56:15] frigate.record WARNING : Discarding a corrupt recording segment: patio-20220531235611.mp4
[2022-05-31 23:56:30] frigate.watchdog INFO : Detection appears to be stuck. Restarting detection process...
[2022-05-31 23:56:30] root INFO : Waiting for detection process to exit gracefully...
[2022-05-31 23:56:30] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:50422]
[2022-05-31 23:56:48] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:50422]
[2022-05-31 23:56:48] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55772]
[2022-05-31 23:56:49] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55772]
[2022-05-31 23:56:50] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55786]
[2022-05-31 23:56:50] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55786]
[2022-05-31 23:56:51] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55794]
[2022-05-31 23:57:00] root INFO : Detection process didnt exit. Force killing...
[2022-05-31 23:57:05] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55794]
[2022-05-31 23:57:05] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40156]
[2022-05-31 23:57:06] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40156]
[2022-05-31 23:57:06] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40170]
[2022-05-31 23:57:07] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40170]
[2022-05-31 23:57:07] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40172]
[2022-05-31 23:57:09] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40172]
[2022-05-31 23:57:09] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40174]
[2022-05-31 23:57:10] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40174]
[2022-05-31 23:57:11] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40180]
[2022-05-31 23:57:11] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40180]
[2022-05-31 23:57:11] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55898]
[2022-05-31 23:57:12] detector.coral_pci INFO : Starting detection process: 638
[2022-05-31 23:57:13] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55898]
[2022-05-31 23:57:13] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55912]
[2022-05-31 23:57:12] frigate.edgetpu INFO : Attempting to load TPU as pci
Process detector:coral_pci:
[2022-05-31 23:57:25] frigate.edgetpu ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
delegate = Delegate(library, options)
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
raise ValueError(capture.message)
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/frigate/frigate/edgetpu.py", line 136, in run_detector
object_detector = LocalObjectDetector(
File "/opt/frigate/frigate/edgetpu.py", line 44, in __init__
edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
raise ValueError('Failed to load delegate from {}\n{}'.format(
ValueError: Failed to load delegate from libedgetpu.so.1.0
[2022-05-31 23:57:32] frigate.watchdog INFO : Detection appears to have stopped. Exiting frigate...
[2022-05-31 23:57:32] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55912]
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[2022-05-31 23:57:33] frigate.video ERROR : patio: Unable to read frames from ffmpeg process.
[2022-05-31 23:57:33] frigate.video ERROR : patio: ffmpeg process is not running. exiting capture thread...
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.
** Press ANY KEY to close this window **
Its getting late here, i will try further troubleshooting tomorrow and get back if I find anything interesting. Thanks for your time.
Huh, very interesting, I am curious how there are no logs between working config and trying to find TPU. Since it was originally loaded seems that it would have some detect process crash (I would think)
[2022-05-31 23:57:13] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:55912]
[2022-05-31 23:57:12] frigate.edgetpu INFO : Attempting to load TPU as pci
Process detector:coral_pci:
[2022-05-31 23:57:25] frigate.edgetpu ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
delegate = Delegate(library, options)
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
raise ValueError(capture.message)
ValueError
anyway, interesting that that helped for a few minutes. I don't know enough to say for sure if that makes it more or less probable to be a non-frigate issue
This sounds like something specific to your hardware. I'm fairly certain others are using PCI Corals on unraid successfully.
I just did the following and it is running for last couple hours without any issues, i will continue to monitor. Also, i want to see what happens after a system reboot (which I am planning to do after the ongoing parity check is completed. )
# sudo groupadd apex
# chown root:apex /dev/apex_0 && chmod 660 /dev/apex_0
# echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
# echo 1 > /sys/bus/pci/rescan
then restarted the container.
I’m running Frigate in an Unraid container with Coral sitting on a PCIe converter. I can paste some screenshots of my config tomorrow if you don’t get it working
So far it looks good. i just need to test a reboot as well, probably tomorrow
Also if it stops working after a restart, can use the user scripts plugin to run those commandline commands after reboot
It is broken again after the system reboot. also the earlier used fix (adding user group etc.) are not working either.
2022-06-02 15:02:21] frigate.app INFO : Capture process started for garden: 284
[2022-06-02 15:02:21] ws4py INFO : Using epoll
[2022-06-02 15:03:03] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:36978]
[2022-06-02 15:03:12] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:36978]
[2022-06-02 15:03:14] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:54042]
[2022-06-02 15:03:31] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:54042]
[2022-06-02 15:03:32] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:45938]
[2022-06-02 15:03:35] ws4py INFO : Managing websocket [Local => 127.0.0.1:8082 | Remote => 127.0.0.1:51168]
[2022-06-02 15:03:41] ws4py INFO : Terminating websocket [Local => 127.0.0.1:8082 | Remote => 127.0.0.1:51168]
[2022-06-02 15:06:01] frigate.watchdog INFO : Detection appears to be stuck. Restarting detection process...
[2022-06-02 15:06:01] root INFO : Waiting for detection process to exit gracefully...
[2022-06-02 15:06:30] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:45938]
[2022-06-02 15:06:31] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40224]
[2022-06-02 15:06:31] root INFO : Detection process didnt exit. Force killing...
[2022-06-02 15:06:37] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40224]
[2022-06-02 15:06:38] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40234]
[2022-06-02 15:06:44] detector.coral_pci INFO : Starting detection process: 760
Process detector:coral_pci:
[2022-06-02 15:06:57] frigate.edgetpu ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
delegate = Delegate(library, options)
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
raise ValueError(capture.message)
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/frigate/frigate/edgetpu.py", line 136, in run_detector
object_detector = LocalObjectDetector(
File "/opt/frigate/frigate/edgetpu.py", line 44, in __init__
edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
raise ValueError('Failed to load delegate from {}\n{}'.format(
ValueError: Failed to load delegate from libedgetpu.so.1.0
[2022-06-02 15:07:04] frigate.watchdog INFO : Detection appears to have stopped. Exiting frigate...
[2022-06-02 15:07:04] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:40234]
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[2022-06-02 15:07:04] frigate.video ERROR : patio: Unable to read frames from ffmpeg process.
[2022-06-02 15:07:04] frigate.video ERROR : patio: ffmpeg process is not running. exiting capture thread...
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.
I recently upgraded to unraid 6.10.2 which apparently included Linux kernal update (?) and the coral driver is not updated ? am not sure whats wrong.
Also seeing this msg under dmesg
[ 436.624111] apex 0000:04:00.0: RAM did not enable within timeout (12000 ms)
[ 449.431913] apex 0000:04:00.0: RAM did not enable within timeout (12000 ms)
[ 449.431916] apex 0000:04:00.0: Error in device open cb: -110
Have you tried this
from https://coral.ai/docs/m2/get-started/#troubleshooting-on-linux
Was advised for a user having the same issue in 491 of the coral repo linked above. Either way seems like some sort of initialization error
Tried that and that file was gone after reboot.
root@Fusion:~# lsmod | grep apex
apex 16384 0
gasket 94208 1 apex
root@Fusion:~# less /etc/modprobe.d/blacklist-apex.conf
/etc/modprobe.d/blacklist-apex.conf: No such file or directory
Yeah Unraid runs in RAM /etc is not kept. Would need to use one of the boot file scripts to put the files there when Unraid boots up
i will try changing the pcie port and see if that helps.
So I went about my Frigate install a little different on Unraid. No terminal commands or crazy changes. For context I am running Unraid on an i5 with Nvidia GPU & Coral TPU connected via PCIe. Here is what I did:
Install the Coral drivers from the app store:
Then edited like this:
Make sure you replace the repository to the current beta - blakeblackshear/frigate:0.11.0-beta2
If you have a Nvidia card also do this. Press advanced view, add --gpus all into the Extra Parameters: section.
Also install the Nvidia drivers from the app store:
@DrSpaldo you're on Unraid 6.10.2?
@DrSpaldo you're on Unraid 6.10.2?
My bad, I didn't even see that. My post may be worthless then, I am still running Version: 6.9.2
Hopefully at least it may be useful for someone on the current release
I did a full reinstall of my Unraid Server, setup everything again. So far it looks promising. frigate and coral seems happy so far (since 5 days).
Thanks for the support.
have another question - more of a docker issue . i will open another thread for that.
Interesting, maybe there was some conflict or something somewhere
Closing this issue ..
sorry to open this up again but I have exactly the same issue on unraid. When I boot the server the TPU is found and is alive. As soon as I start frigate the tpu shuts down and isn't found any more.
tried switching to the latest beta4 but the error still continues.
I don't really feel like starting my unraid system from scratch so is there any other solution? Currently still on 6.9.2
@narayanvs did you swtich pci slots? did that help?
@narayanvs hey, I think you are running in the temperature limits, please head over to the Unraid forums to my or the Frigate support thread where I already posted a tutorial on how to write your Coral TPU temperatures to a file Click.
Are you sure that you have enough airflow over your TPU? What kind of TPU are you using?
@ministryofsillywalks I think you've already asked on the Unraid forums about that, please don't do double posts across different platforms...
From my perspective this is an issue with the TPU itself and has nothing to do with Frigate...
I ran into similar issues again where the TPU shuts down and frigate crashes. TBH i did not get much time to dive deep and gave on this TPU/Frigate combo for now.
For now I have masked the the whole scene and gone back to running frigate with CPU. Essentially not using the TPU, nor the object detection. I will do fusther reaserch on the forums to find a permenent fix.
You don't have to mask everything. You can just disable detect in the config.
I will do fusther reaserch on the forums to find a permenent fix.
May I ask if you have something form the container on an Unassigned Device, if yes make sure that you've select Read/Write Slave
on that share in the Docker template. A user reported on the forums that this caused the shutdown but TBH I really can't imagine that is causing the issue.
Also please see this post on how to write a file on your Unraid machine to monitor your temps: Click Seems like this is not really related to Frigate itself.
i do not have any Unassigned Device on my server. my recording files are stored on a mouted share called cctv
here is my docker config.
and here is the share settings
should I change this to read/write - slave ?
at start, the coral temp looks normal. but does anyone know what triggers the high temperature (other than the unassigned device theory ) ?
2022-07-06 07:18:57 Coral Temp: 47.05C
2022-07-06 07:19:12 Coral Temp: 47.55C
2022-07-06 07:19:27 Coral Temp: 48.05C
2022-07-06 07:19:42 Coral Temp: 48.05C
2022-07-06 07:19:57 Coral Temp: 48.55C
2022-07-06 07:20:12 Coral Temp: 47.55C
2022-07-06 07:20:27 Coral Temp: 47.80C
could this be a bad PCI card that is used for mounting mini-PCI ?
Just incase someone visiting this thread later. my ultimate fix that helped to solve this issue was to move frigate.db
from unraid share (default location, along with recordings, snapshots etc) to a dedicated folder under appdata
on my cache drive (which will always stay on cache (with cache only
option set). After making this change, it never crashed again for more than a month now.
and obviously set that change in the config as well with
database:
path: /whatever/path/set/on/the/container/frigate.db
I am having a similar issue. on unraid 6.10.3 using the dual tpu pcie with this adapter https://github.com/magic-blue-smoke/Dual-Edge-TPU-Adapter
frigate works then stop as the one of the 2 tpus shutdowns. I check temp limits and that does not seem the case. I am using an unassigned device with r/w slave set.
2022-08-02 09:56:23 Coral Temp: 48.80C 2022-08-02 09:56:38 Coral Temp: 49.05C 2022-08-02 09:56:53 Coral Temp: 48.80C 2022-08-02 09:57:08 Coral Temp: 49.30C 2022-08-02 09:57:23 Coral Temp: 49.55C 2022-08-02 09:57:38 Coral Temp: 49.30C 2022-08-02 09:57:53 Coral Temp: 49.55C 2022-08-02 09:58:08 Coral Temp: 49.30C 2022-08-02 09:58:23 Coral Temp: 49.55C 2022-08-02 09:58:38 Coral Temp: 49.80C 2022-08-02 09:58:53 Coral Temp: 49.30C 2022-08-02 09:59:08 Coral Temp: -89.70C 2022-08-02 09:59:23 Coral Temp: -89.70C 2022-08-02 09:59:38 Coral Temp: -89.70C 2022-08-02 09:59:53 Coral Temp: -89.70C 2022-08-02 10:00:08 Coral Temp: -89.70C 2022-08-02 10:00:23 Coral Temp: -89.70C 2022-08-02 10:00:38 Coral Temp: -89.70C 2022-08-02 10:00:53 Coral Temp: -89.70C 2022-08-02 10:01:08 Coral Temp: -89.70C 2022-08-02 10:01:23 Coral Temp: -89.70C 2022-08-02 10:01:38 Coral Temp: -89.70C 2022-08-02 10:01:53 Coral Temp: -89.70C
after it stops working it goes to negative temp
also in my unraid syslog
Aug 3 09:01:26 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:32 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:37 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:42 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:47 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:52 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:57 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:02 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:07 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:12 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:18 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:23 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:28 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:33 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:38 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:43 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:48 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature
any suggestions?
I am having a similar issue. on unraid 6.10.3 using the dual tpu pcie with this adapter https://github.com/magic-blue-smoke/Dual-Edge-TPU-Adapter
frigate works then stop as the one of the 2 tpus shutdowns. I check temp limits and that does not seem the case. I am using an unassigned device with r/w slave set.
2022-08-02 09:56:23 Coral Temp: 48.80C 2022-08-02 09:56:38 Coral Temp: 49.05C 2022-08-02 09:56:53 Coral Temp: 48.80C 2022-08-02 09:57:08 Coral Temp: 49.30C 2022-08-02 09:57:23 Coral Temp: 49.55C 2022-08-02 09:57:38 Coral Temp: 49.30C 2022-08-02 09:57:53 Coral Temp: 49.55C 2022-08-02 09:58:08 Coral Temp: 49.30C 2022-08-02 09:58:23 Coral Temp: 49.55C 2022-08-02 09:58:38 Coral Temp: 49.80C 2022-08-02 09:58:53 Coral Temp: 49.30C 2022-08-02 09:59:08 Coral Temp: -89.70C 2022-08-02 09:59:23 Coral Temp: -89.70C 2022-08-02 09:59:38 Coral Temp: -89.70C 2022-08-02 09:59:53 Coral Temp: -89.70C 2022-08-02 10:00:08 Coral Temp: -89.70C 2022-08-02 10:00:23 Coral Temp: -89.70C 2022-08-02 10:00:38 Coral Temp: -89.70C 2022-08-02 10:00:53 Coral Temp: -89.70C 2022-08-02 10:01:08 Coral Temp: -89.70C 2022-08-02 10:01:23 Coral Temp: -89.70C 2022-08-02 10:01:38 Coral Temp: -89.70C 2022-08-02 10:01:53 Coral Temp: -89.70C
after it stops working it goes to negative temp
also in my unraid syslog
Aug 3 09:01:26 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:32 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:37 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:42 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:47 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:52 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:01:57 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:02 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:07 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:12 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:18 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:23 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:28 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:33 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:38 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:43 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature Aug 3 09:02:48 RAID kernel: apex 0000:83:00.0: Apex performance not throttled due to temperature
any suggestions?
Exact same setup in my case but has been working totally fine. What do they show as in the coral driver page? also, make sure you have the most up to date driver plugin version
Thank you for your quick reply
Interesting, not sure, I have adjusted my throttling and shutdown temps lower and I also have a fan directly on the TPU card so they are cooled. I would think there would be a way to get more logs on why it was shut down
I saw something similar to @narayanvs in my logs about the ram timeout
This issue looks related, how are you passing the device in to frigate docker container? https://github.com/google-coral/edgetpu/issues/345
reading through this https://github.com/google-coral/edgetpu/issues/345 got a little out of my depth but I understand a bit of it.
I did have this working fine with a different pcie adaptor but that was only seeing a single tpu before I got magic-blue-smoke's adaptor so I can use both tpus on the m.2 coral.
for clarity I have the m.2 dual coral tpu using the magic-blue-smoke adaptor to pcie
I did have this working fine with a different pcie adaptor but that was only seeing a single tpu before I got magic-blue-smoke's adaptor so I can use both tpus on the m.2 coral.
for clarity I have the m.2 dual coral tpu using the magic-blue-smoke adaptor to pcie
I'm using the same adapter with dual TPU
I did have this working fine with a different pcie adaptor but that was only seeing a single tpu before I got magic-blue-smoke's adaptor so I can use both tpus on the m.2 coral. for clarity I have the m.2 dual coral tpu using the magic-blue-smoke adaptor to pcie
I'm using the same adapter with dual TPU
any other information I can get you to troubleshoot? did you modify any of your settings?
any other information I can get you to troubleshoot? did you modify any of your settings?
This is all I changed
any other information I can get you to troubleshoot? did you modify any of your settings?
This is all I changed
hmmm weird, I mean from what I can tell it is not a temperature issue. I am on Frigate 10 still. not sure if updating to the latest beta might help? It is mostly annoying that I have to reboot my whole unraid server to have it work again.
Did you try moving your frigate.db to your cache drive and point frigate to that ? That helped in my case. Also you don’t have to reboot unRAID server to get the tpu back online.. can’t remember the exact command but it is mentioned somewhere on this thread.
I am on Frigate 10 still. not sure if updating to the latest beta might help?
I don't think so, we didn't change anything having to do with coral in 0.11
It is mostly annoying that I have to reboot my whole unraid server to have it work again.
I would think there would be some command to reinit the coral instead of needing a restart. Either way, I did make quite a few changes in the host BIOS, maybe that has something to do with it. I set the PCIe as Gen 2 for the coral (which is what it runs at) and then I also disabled all C-States. Made other changes too but none that I think would affect the coral.
Here is my logic to the problem.
your frigate.db lives on a mechanical drive behind the share (which is probably cache enabled) and the most recent footage (live/recorded, not sure how frigate or coral look for objects) either lives on the cache or RAM (which is 10 or even 100 times faster than mechanical drive). Now, when coral try to analyse footage for objects it needs to probably write/read info on the database and the response is not received as quick as it expects. This cause probably retries or failure and the tpu is going mad. At this point the temperature rises and it shuts down.
That being said, it is my wild guess and I don’t have any data to prove other than the fact that it worked for me after the db was moved to a faster storage. This is what i did.
Good luck solving this 😂
Here is my logic to the problem.
Not necessarily doubting that this had no effect, but there's a few things that aren't correct about the assumptions and wanted to clear up
the most recent footage (live/recorded, not sure how frigate or coral look for objects) either lives on the cache or RAM (which is 10 or even 100 times faster than mechanical drive).
The frames live in memory when sent to the detector for object detection
Now, when coral try to analyse footage for objects it needs to probably write/read info on the database and the response is not received as quick as it expects.
The coral process doesn't touch the database at all, it returns the detection results to a different process which writes to the DB but the coral itself doesn't depend on the DB
I also have a dual cpu setup so certain pcie slots are going to either cpu1 or cpu2
unsure if that effects things or not
I wouldn't think so since they're all on the same PCIe lanes just using two separate bus on the same lanes. The adapter works on Gen 2 x 1 so shouldn't be any compatibility issues at all. I'd definitely check CPU C-states though, could be sleeping something it shouldn't be.
I had my own issues with the LAN port randomly sleeping after some time and disabling C states fixed it
Describe the problem you are having
I am new to frigate. Trying to switch from BlueIris. I have Frigate running as a docker CT on Unraid 6.10.2 (stable) with the below config
Earlier i was using CPU detection as i didn't have a coral device. And i managed to grab a coral mini PCIE device which is attached to my MSI Z590 A-PRO MB using this pcie adapter
it was running fine for couple of days but then today i noticed the CT not able to start and I see the below error in logs.
and my Unraid syslog has something related
Looks like the server is able to see the pcie device
Also, I have the coral driver installed it it says the pcie is shutdown
and lastly, here is my docket config.
Not sure if this is an issue related to Coral or Frigate (or may be my Motherboard/PC hardware in genaral).
Any help will be appreciated.
Version
0.10.1-83481af
Frigate config file
Relevant log output
FFprobe output from your camera
Frigate stats
No response
Operating system
UNRAID
Install method
Docker Compose
Coral version
PCIe
Network connection
Wired
Camera make and model
Dahua IPC-T5442TM-AS
Any other information that may be helpful
No response