blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
18.46k stars 1.68k forks source link

[HW Accel Support]: Fresh pull of v13 in Unraid - CUDA initialization failure #9575

Closed usafle closed 7 months ago

usafle commented 8 months ago

Describe the problem you are having

I had removed the original Frigate container and template and pulled down a "fresh" copy for v13 and installed the NVIDIA Branch when it asked me in Community Apps which branch I wanted to install. So I did not upgrade from 12-13, I started with a new pull down of the container. I have a CUDA capable GPU installed and visibile.

Version

v13

Frigate config file

mqtt:
  enabled: true
  host: 192.168.1.102
  user: frigate
  password: PASSWORD
# detectors:
#  cpu1:
#    type: cpu
#    num_threads: 2

# birdseye:
#   enabled: True
#   restream: false
#   mode: continuous
#   width: 1280
#   height: 720
#   quality: 8

go2rtc:
  streams:
    Rear_Deck:
      - rtsp://admin:PASSWORD@192.168.1.114:554/h264Preview_01_main
    Rear_Deck_sub:
      - rtsp://admin:PASSWORD@192.168.1.114:554/h264Preview_01_sub
    Garage_Camera:
      - rtsp://admin:PASSWORD@192.168.1.215:554/cam/realmonitor?channel=1&subtype=0
    Garage_Camera_sub:
     - rtsp://admin:PASSWORD@192.168.1.215:554/cam/realmonitor?channel=1&subtype=1

ffmpeg:
  hwaccel_args: preset-nvidia-h265

rtmp:
  enabled: False 

cameras:
############## REAR DECK ##################
  Rear_Deck:
    ffmpeg:
      inputs:
        - path: rtsp://127.0.0.1:8554/Rear_Deck_sub
          input_args: preset-rtsp-restream
          roles:
            - detect
        - path: rtsp://127.0.0.1:8554/Rear_Deck
          input_args: preset-rtsp-restream
          roles:
            - record
      output_args:
        record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c:v copy -c:a aac
    objects:
      track:
        - person
        - dog
        - bird
        - cat
    detect:
      width: 1280
      height: 720
      fps: 4
    record:
      enabled: True
      events:
        retain:
          default: 2
    snapshots:
      enabled: True

  Garage_Camera:
    ffmpeg:
      inputs:
        - path: rtsp://127.0.0.1:8554/Garage_Camera_sub
          input_args: preset-rtsp-restream
          roles:
            - detect
        - path: rtsp://127.0.0.1:8554/Garage_Camera
          input_args: preset-rtsp-restream
          roles:
            - record
      output_args:
        record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c:v copy -c:a aac
        # record: preset-record-generic-audio-aac
    objects:
      track:
        - person
        - dog
        - cat
        - car
        - package
    detect:
      width: 1280
      height: 720
      fps: 4
    record:
      enabled: True
      events:
        retain:
          default: 2
    snapshots:
      enabled: True

docker-compose file or Docker CLI command

Installed Via Community Apps

Relevant log output

s6-rc: info: service s6rc-fdholder: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service s6rc-fdholder successfully started
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service trt-model-prepare: starting
s6-rc: info: service log-prepare: starting
s6-rc: info: service log-prepare successfully started
s6-rc: info: service nginx-log: starting
s6-rc: info: service go2rtc-log: starting
s6-rc: info: service frigate-log: starting
s6-rc: info: service nginx-log successfully started
s6-rc: info: service go2rtc-log successfully started
s6-rc: info: service go2rtc: starting
s6-rc: info: service frigate-log successfully started
s6-rc: info: service go2rtc successfully started
s6-rc: info: service go2rtc-healthcheck: starting
s6-rc: info: service go2rtc-healthcheck successfully started
Generating the following TRT Models: yolov4-416,yolov4-tiny-416
Downloading yolo weights
2024-02-01 10:30:12.079536551  [INFO] Preparing new go2rtc config...
2024-02-01 10:30:13.159361526  [INFO] Starting go2rtc...
2024-02-01 10:30:13.279687971  10:30:13.279 INF go2rtc version 1.8.4 linux/amd64
2024-02-01 10:30:13.280390371  10:30:13.280 INF [api] listen addr=:1984
2024-02-01 10:30:13.280428249  10:30:13.280 INF [rtsp] listen addr=:8554
2024-02-01 10:30:13.280808608  10:30:13.280 INF [webrtc] listen addr=:8555

Creating yolov4-tiny-416.cfg and yolov4-tiny-416.weights
Creating yolov4-416.cfg and yolov4-416.weights

Done.
2024-02-01 10:30:21.744747617  [INFO] Starting go2rtc healthcheck service...

Generating yolov4-416.trt. This may take a few minutes.

Traceback (most recent call last):
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 214, in <module>
    main()
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 202, in main
    engine = build_engine(
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 112, in build_engine
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(*EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
TypeError: pybind11::init(): factory function returned nullptr
[02/01/2024-10:30:38] [TRT] [W] Unable to determine GPU memory usage
[02/01/2024-10:30:38] [TRT] [W] Unable to determine GPU memory usage
[02/01/2024-10:30:38] [TRT] [W] CUDA initialization failure with error: 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Loading the ONNX file...

Generating yolov4-tiny-416.trt. This may take a few minutes.

Traceback (most recent call last):
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 214, in <module>
    main()
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 202, in main
    engine = build_engine(
  File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 112, in build_engine
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(*EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
TypeError: pybind11::init(): factory function returned nullptr
[02/01/2024-10:30:41] [TRT] [W] Unable to determine GPU memory usage
[02/01/2024-10:30:41] [TRT] [W] Unable to determine GPU memory usage
[02/01/2024-10:30:41] [TRT] [W] CUDA initialization failure with error: 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Loading the ONNX file...
Available tensorrt models:
ls: cannot access '*.trt': No such file or directory
s6-rc: warning: unable to start service trt-model-prepare: command exited 2

FFprobe output from your camera

Can't access Frigate due to above error

Operating system

UNRAID

Install method

Docker Compose

Network connection

Wired

Camera make and model

Reolink + Amcrest

Any other information that may be helpful

No response

NickM-27 commented 8 months ago

what version of the nvidia driver is installed and what is your docker cli command

usafle commented 8 months ago

Nvidia Driver Version: 545.29.06 / NVIDIA GeForce GTX 1050

I don't have a CLI command, I installed it via Community Apps. Hopefully I'm answering that question correctly?

NickM-27 commented 8 months ago

you do have cli, the unraid community apps just setup a docker command and you are shown this docker command when you apply changes to the unraid container

usafle commented 8 months ago

Perhaps you are looking for the template when it pulls down the container from C.A.?

Screenshot 2024-02-01 at 12-26-03 CozsNAS_UpdateContainer Screenshot 2024-02-01 at 12-26-19 CozsNAS_UpdateContainer

NickM-27 commented 8 months ago

no, after you press apply at the bottom it shows you the cli command

usafle commented 8 months ago

Screenshot 2024-02-01 at 12-30-52 CozsNAS_UpdateContainer

Thanks for the clarification

NickM-27 commented 8 months ago

you need to add --gpus=all to the extra arguments list

usafle commented 8 months ago

S.O.B. That fixed it. She starts now. Question while I have your attention, What do I now do with the

`# detectors:

cpu1:

type: cpu

num_threads: 2`

Will it automatically utilize the GPU now for detectors or, do I have to specfically put a different line of code in he YML?

NickM-27 commented 8 months ago

if you want it to use the GPU for object detection then you should follow https://docs.frigate.video/configuration/object_detectors#nvidia-tensorrt-detector

usafle commented 8 months ago

So I should be paying attention to this specific paragraph?

detectors:
  tensorrt:
    type: tensorrt
    device: 0 #This is the default, select the first GPU

model:
  path: /config/model_cache/tensorrt/yolov7-320.trt
  input_tensor: nchw
  input_pixel_format: rgb
  width: 320
  height: 320
NickM-27 commented 8 months ago

Yes

usafle commented 8 months ago

So now there is some sort of python error: Segmentation Fault.

More than likely it's probably my fault.

2024-02-01 17:45:24.201421738  [2024-02-01 17:45:24] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 451 (MiB)
2024-02-01 17:45:26.505945177  [INFO] Starting go2rtc healthcheck service...
2024-02-01 17:45:28.031121208  Fatal Python error: Segmentation fault
2024-02-01 17:45:28.031132500  
2024-02-01 17:45:28.031143201  Thread 0x000014c0c25ee6c0 (most recent call first):
2024-02-01 17:45:28.031181405    File "/usr/lib/python3.9/threading.py", line 312 in wait
2024-02-01 17:45:28.031752450    File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
2024-02-01 17:45:28.031834378    File "/usr/lib/python3.9/threading.py", line 892 in run
2024-02-01 17:45:28.031839453    File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2024-02-01 17:45:28.032128629    File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2024-02-01 17:45:28.032139296  
2024-02-01 17:45:28.032142998  Current thread 0x000014c0e90f8740 (most recent call first):
2024-02-01 17:45:28.032147248    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 168 in <listcomp>
2024-02-01 17:45:28.032223130    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 167 in _do_inference
2024-02-01 17:45:28.032229472    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 286 in detect_raw
2024-02-01 17:45:28.032309661    File "/opt/frigate/frigate/object_detection.py", line 75 in detect_raw
2024-02-01 17:45:28.032314928    File "/opt/frigate/frigate/object_detection.py", line 125 in run_detector
2024-02-01 17:45:28.032318383    File "/usr/lib/python3.9/multiprocessing/process.py", line 108 in run
2024-02-01 17:45:28.032321965    File "/usr/lib/python3.9/multiprocessing/process.py", line 315 in _bootstrap
2024-02-01 17:45:28.032325542    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71 in _launch
2024-02-01 17:45:28.032329065    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19 in __init__
2024-02-01 17:45:28.032368478    File "/usr/lib/python3.9/multiprocessing/context.py", line 277 in _Popen
2024-02-01 17:45:28.032405583    File "/usr/lib/python3.9/multiprocessing/context.py", line 224 in _Popen
2024-02-01 17:45:28.032452351    File "/usr/lib/python3.9/multiprocessing/process.py", line 121 in start
2024-02-01 17:45:28.032514176    File "/opt/frigate/frigate/object_detection.py", line 183 in start_or_restart
2024-02-01 17:45:28.032610256    File "/opt/frigate/frigate/object_detection.py", line 151 in __init__
2024-02-01 17:45:28.032677035    File "/opt/frigate/frigate/app.py", line 453 in start_detectors
2024-02-01 17:45:28.032756244    File "/opt/frigate/frigate/app.py", line 683 in start
2024-02-01 17:45:28.032838373    File "/opt/frigate/frigate/__main__.py", line 17 in <module>
2024-02-01 17:45:28.032920488    File "/usr/lib/python3.9/runpy.py", line 87 in _run_code
2024-02-01 17:45:28.033003527    File "/usr/lib/python3.9/runpy.py", line 197 in _run_module_as_main
2024-02-01 17:45:42.751911555  [2024-02-01 17:45:42] frigate.watchdog               INFO    : Detection appears to be stuck. Restarting detection process...
2024-02-01 17:45:42.774446711  [2024-02-01 17:45:42] detector.tensorrt              INFO    : Starting detection process: 1257
2024-02-01 17:45:43.696120537  [2024-02-01 17:45:43] frigate.detectors.plugins.tensorrt INFO    : Loaded engine size: 382 MiB
2024-02-01 17:45:44.224290941  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 506, GPU 572 (MiB)
2024-02-01 17:45:44.237778500  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 508, GPU 582 (MiB)
2024-02-01 17:45:44.244529126  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +384, now: CPU 0, GPU 384 (MiB)
2024-02-01 17:45:44.321038782  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 126, GPU 576 (MiB)
2024-02-01 17:45:44.325099439  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 126, GPU 584 (MiB)
2024-02-01 17:45:44.325186438  [2024-02-01 17:45:44] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 451 (MiB)
2024-02-01 17:45:44.330534549  Fatal Python error: Segmentation fault
2024-02-01 17:45:44.330541049  
2024-02-01 17:45:44.330566522  Thread 0x000014c0d9bf96c0 (most recent call first):
2024-02-01 17:45:44.330629019    File "/usr/lib/python3.9/threading.py", line 312 in wait
2024-02-01 17:45:44.330699041    File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
2024-02-01 17:45:44.330752018    File "/usr/lib/python3.9/threading.py", line 892 in run
2024-02-01 17:45:44.330826565    File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2024-02-01 17:45:44.330882447    File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2024-02-01 17:45:44.330884860  
2024-02-01 17:45:44.330905318  Current thread 0x000014c0d97f76c0 (most recent call first):
2024-02-01 17:45:44.330974805    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 168 in <listcomp>
2024-02-01 17:45:44.331058941    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 167 in _do_inference
2024-02-01 17:45:44.331145620    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 286 in detect_raw
2024-02-01 17:45:44.331218329    File "/opt/frigate/frigate/object_detection.py", line 75 in detect_raw
2024-02-01 17:45:44.331295305    File "/opt/frigate/frigate/object_detection.py", line 125 in run_detector
2024-02-01 17:45:44.331365046    File "/usr/lib/python3.9/multiprocessing/process.py", line 108 in run
2024-02-01 17:45:44.331445566    File "/usr/lib/python3.9/multiprocessing/process.py", line 315 in _bootstrap
2024-02-01 17:45:44.331529044    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71 in _launch
2024-02-01 17:45:44.331622515    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19 in __init__
2024-02-01 17:45:44.331702885    File "/usr/lib/python3.9/multiprocessing/context.py", line 277 in _Popen
2024-02-01 17:45:44.331781844    File "/usr/lib/python3.9/multiprocessing/context.py", line 224 in _Popen
2024-02-01 17:45:44.331849831    File "/usr/lib/python3.9/multiprocessing/process.py", line 121 in start
2024-02-01 17:45:44.331918830    File "/opt/frigate/frigate/object_detection.py", line 183 in start_or_restart
2024-02-01 17:45:44.331989877    File "/opt/frigate/frigate/watchdog.py", line 34 in run
2024-02-01 17:45:44.332064820    File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2024-02-01 17:45:44.332133000    File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2024-02-01 17:45:52.766515491  [2024-02-01 17:45:52] frigate.watchdog               INFO    : Detection appears to have stopped. Exiting Frigate...
2024-02-01 17:45:52.790404169  [INFO] The go2rtc-healthcheck service exited with code 256 (by signal 15)
2024-02-01 17:45:52.849888651  [INFO] Service NGINX exited with code 0 (by signal 0)
2024-02-01 17:45:52.853521917  [2024-02-01 17:45:52] frigate.app                    INFO    : Stopping...
2024-02-01 17:45:52.854274087  [2024-02-01 17:45:52] frigate.ptz.autotrack          INFO    : Exiting autotracker...
2024-02-01 17:45:52.854802363  [2024-02-01 17:45:52] frigate.storage                INFO    : Exiting storage maintainer...
2024-02-01 17:45:52.860287936  [2024-02-01 17:45:52] frigate.stats                  INFO    : Exiting stats emitter...
2024-02-01 17:45:52.860293391  [2024-02-01 17:45:52] frigate.watchdog               INFO    : Exiting watchdog...
2024-02-01 17:45:52.867881182  [2024-02-01 17:45:52] frigate.record.cleanup         INFO    : Exiting recording cleanup...
2024-02-01 17:45:52.868901156  [2024-02-01 17:45:52] frigate.events.cleanup         INFO    : Exiting event cleanup...
2024-02-01 17:45:52.868906792  [2024-02-01 17:45:52] frigate.object_processing      INFO    : Exiting object processor...
2024-02-01 17:45:53.002306542  [2024-02-01 17:45:53] frigate.comms.ws               INFO    : Exiting websocket client...
2024-02-01 17:45:53.779081109  [2024-02-01 17:45:53] frigate.events.maintainer      INFO    : Exiting event processor...
2024-02-01 17:45:53.779503732  [2024-02-01 17:45:53] peewee.sqliteq                 INFO    : writer received shutdown request, exiting.
2024-02-01 17:45:53.783142912  [2024-02-01 17:45:53] frigate.record.maintainer      INFO    : Exiting recording maintenance..
NickM-27 commented 8 months ago

might be the driver, 535 is recommended version generally

usafle commented 8 months ago

...and here I thought keeping everything up to date was the best idea.....

NickM-27 commented 8 months ago

I believe 545 is not marked stable yet

usafle commented 8 months ago

I'm about to downgrade my driver to get this working or, to see if it IS actually the driver causing these issues. There are multiple v535 drivers:

  1. v535.129.03
  2. v535.113.01
  3. v535.104.05
  4. v530.41.03

Do you have a preference in which should be tried?

NickM-27 commented 7 months ago

Choosing for https://github.com/blakeblackshear/frigate/issues/9801