Closed usafle closed 3 months ago
what GPU do you have again?
Edited Post to remove the Frigtate Plus API key and RTSP password that was visible in the Docker CLI command.
No one has ANY suggestions (besides buying a CORAL device) on how to get my Frigtate instance up and running??
seg faults are difficult because it is usually something related to the host or the hardware and there is no info about what is going wrong.
From your previous post logs we can see that as soon as the model is initialized there is a seg fault indicating some failure to communicate correctly. Many users use this type of setup on unraid so it seems there is nothing particular about that. You could try a memtest and see if perhaps system memory is failing.
memtest complete. 0 errors.
Next suggestion please?
I have the same error. Need help I have a eufy cam2 pro which only sends a stream when a motion is detected. I suspect this could be a potential cause. Any thoughts?
I'm experiencing the exact same error on TrueNAS Scale w/ GTX 1060
@hvardhan20 and @jdgiddings - I hope you both get a response but, if my past experience holds true, it doesn't look good. CPU detection worked fine. GPU detection worked fine...... until they bundled it all into one container.
Hard to fix issues when you don't have any support from anyone here.
There are many tensorrt users so this seems to be a very isolated problem. Like I said before, seg faults are difficult to debug and without being able to reproduce there really isn't any good way to move towards solving the problem because it is not clear what is causing this other than something on the host.
The logic to compile the models is the same as before just done automatically, that is unlikely to be causing this. It could be due to using newer libraries / tensorrt version but that was done to support the latest Nvidia GPUs and also unrelated to frigate building the models automatically.
here's the output from nvidia-smi on the host. I believe these are all supported versions
NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2
I'm experimenting with different models right now to see if any do not cause the error. I will report back
yolov7-320 does not throw the segfault
which model did you use that did?
yolov7x-640 and yolov7x-320 were both throwing the error on my machine
I did some more testing. Any model larger than yolov7-320 throws the same segfault error
I just wanted to add another voice here -- I am able to run yolov7x-320, but if I attempt to run yolov7x-640, I get a segfault (the same as the OP). I'm on a GTX 1650 Super. My setup is a bit odd:
Let me know if I can do anything to help debug this.
[edit] I previously said my 1650 is an LHR. This is incorrect. My 3060 is LHR, and I confused the two.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I have same issue with nVidia Quadro P5000 16GB RAM
Driver Version: 555.42.02
CUDA Version: 12.5
OS: Ubuntu 22
My first time trying to use frigate and I'm running into this. nvidia 1070ti on unraid. The yolov4-416.trt model was in the container by default and it segfaults if I try to use it.
I tried the 320 model and that seems to work. There appears to be a common thread here where nvidia+unraid+model larger than 320 fails.
It's more likely to be something related to driver and the GPU that is used. Unraid and 3050 works on any model for me
Although I've given up on this, I am still actively watching this thread. It's still the same reply / response from the only Collaborator that decides to look at this thread. I am sure we all appriecieate the time you are taking to reply NickM-27. However, your answer doesn't really hold water at this point. Multiple different GPUs, multiple different Driver versions, even one individual was here on a different OS than UnRaid.
For me i had issues with segmentation issues.
What I found out and had fogotten about that if i have the yolo7x-640.trt I had to have teh With and height variables under model at the same IE: Model: path: /yourpathhere Input_tensor: nchw input_pixel_format: rgb width: 640 height: 640
for me the width and height had to match the model size and once i did that i didnt have segmentation problems anymore.
Also, have you created the models under /config/model_cache/tensorrt/? I couldnt see the that in ur docker launch that it had anything about building the models, i might be wrong and blind if so : nothing to see here.
Thank you! Got yolo7x-640 working without segmentation fault now on my nVidia P5000.
docker-compose:
services:
frigate:
container_name: frigate
privileged: true
restart: unless-stopped
image: ghcr.io/blakeblackshear/frigate:0.14.0-beta2-tensorrt
shm_size: "256mb"
volumes:
- /etc/localtime:/etc/localtime:ro
- ./config:/config
- /mnt/hdd/frigate:/media/frigate
- type: tmpfs
target: /tmp/cache
tmpfs:
size: 1000000000
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
ports:
- "8080:8080"
- "5000:5000"
- "8554:8554" # RTSP feeds
- "8555:8555/tcp" # WebRTC over tcp
- "8555:8555/udp" # WebRTC over udp
environment:
FRIGATE_RTSP_PASSWORD: "password"
YOLO_MODELS: yolov7x-640
USE_FP16: false
config.yaml:
ffmpeg:
hwaccel_args: preset-nvidia-h264
detectors:
tensorrt:
type: tensorrt
model:
path: /config/model_cache/tensorrt/yolov7x-640.trt
input_tensor: nchw
input_pixel_format: rgb
width: 640
height: 640
...
I see 1.6GB of VRAM used.
slavik@ub22gpu:~$ nvidia-smi
Thu Jun 6 04:02:27 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro P5000 Off | 00000000:0B:00.0 Off | Off |
| 38% 61C P0 157W / 180W | 1525MiB / 16384MiB | 19% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 8767 C frigate.detector.tensorrt 678MiB |
| 0 N/A N/A 8849 C ffmpeg 278MiB |
| 0 N/A N/A 8850 C ffmpeg 127MiB |
| 0 N/A N/A 8857 C ffmpeg 133MiB |
| 0 N/A N/A 8875 C ffmpeg 147MiB |
| 0 N/A N/A 8876 C ffmpeg 158MiB |
+-----------------------------------------------------------------------------------------+
P.S. After running it for about a day, I see it crashing every couple of hours:
[2024-06-07 19:53:03] detector.tensorrt INFO : Exited detection process...
[2024-06-07 19:53:03] detector.tensorrt INFO : Starting detection process: 507144
[2024-06-07 19:53:07] frigate.detectors.plugins.tensorrt INFO : Loaded engine size: 392 MiB
[2024-06-07 19:53:08] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 758, GPU 1354 (MiB)
[2024-06-07 19:53:08] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 760, GPU 1364 (MiB)
[2024-06-07 19:53:08] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +394, now: CPU 0, GPU 394 (MiB)
[2024-06-07 19:53:08] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 368, GPU 1360 (MiB)
[2024-06-07 19:53:08] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 368, GPU 1368 (MiB)
[2024-06-07 19:53:08] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +145, now: CPU 0, GPU 539 (MiB)
[2024-06-07 19:57:38] detector.tensorrt INFO : Signal to exit detection process...
[2024-06-07 19:57:39] detector.tensorrt INFO : Exited detection process...
[2024-06-07 19:57:39] detector.tensorrt INFO : Starting detection process: 509270
[2024-06-07 19:57:43] frigate.detectors.plugins.tensorrt INFO : Loaded engine size: 392 MiB
[2024-06-07 19:57:44] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 753, GPU 1354 (MiB)
[2024-06-07 19:57:44] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 755, GPU 1364 (MiB)
[2024-06-07 19:57:44] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +394, now: CPU 0, GPU 394 (MiB)
[2024-06-07 19:57:44] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 363, GPU 1360 (MiB)
[2024-06-07 19:57:44] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 363, GPU 1368 (MiB)
[2024-06-07 19:57:44] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +145, now: CPU 0, GPU 539 (MiB)
[2024-06-07 19:58:57] detector.tensorrt INFO : Signal to exit detection process...
[2024-06-07 19:58:57] detector.tensorrt INFO : Exited detection process...
But may be that was because I was simultaneously running other processes on the same card
Describe the problem you are having
Launching v13 with NVIDIA branch causes a bootloop with the above error and no other explanation. I was told in a different support ticket that my NVIDIA driver version was too new
I have since downgraded to Driver v535.129.03 which is supposedly stable according to the last ticket I opened (https://github.com/blakeblackshear/frigate/issues/9575)
The error still is present.
Version
v13
Frigate config file
docker-compose file or Docker CLI command
Relevant log output
Operating system
UNRAID
Install method
Docker Compose
Coral version
CPU (no coral)
Any other information that may be helpful
No response