blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
19.2k stars 1.76k forks source link

[HW Accel Support]: cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN #4136

Closed weitheng closed 1 year ago

weitheng commented 2 years ago

Describe the problem you are having

Getting this ERROR : [AVHWDeviceContext @ 0x55bf99883d00] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error when trying to use my Nvidia Quadro P4000 GPU for ffmpeg hardware acceleration. I am running Frigate on Turnkeycore Debian 11 LXC Container, using Proxmox. I have installed the Nvidia kernel on Proxmox, and matched it with the proper drivers on the LXC container.

The host and container seems to have access to libnvcuvid1. Got this libnvcuvid.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 when running this docker exec -it frigate ldconfig -p | grep cuvid -- Container

Host: ldconfig -p | grep cuvid libnvcuvid.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvcuvid.so.1 libnvcuvid.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvcuvid.so

I should have all the right versions here:

dpkg -l | grep 'nvidia-\(docker\|driver\|container\)'
ii  libnvidia-container-tools       1.11.0-1                                amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64      1.11.0-1                                amd64        NVIDIA container runtime library
ii  nvidia-container-runtime        3.11.0-1                                all          NVIDIA container runtime
ii  nvidia-container-toolkit        1.11.0-1                                amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base   1.11.0-1                                amd64        NVIDIA Container Toolkit Base
ii  nvidia-docker2                  2.11.0-1                                all          nvidia-docker CLI wrapper
ii  nvidia-driver                   515.65.01-1                             amd64        NVIDIA metapackage
ii  nvidia-driver-bin               515.65.01-1                             amd64        NVIDIA driver support binaries
ii  nvidia-driver-libs:amd64        515.65.01-1                             amd64        NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)

Version

0.11.1-2EADA21

Frigate config file

detectors:
  coral:
    type: edgetpu
    device: usb

# Optional: model modifications
model:
  # Optional: path to the model (default: automatic based on detector) /edgetpu_model.tflite
  path: spaghettinet_edgetpu_l_compiled.tflite 
  # Optional: path to the labelmap (default: shown below)
  labelmap_path: /labelmap.txt
  # Required: Object detection model input width (default: shown below)
  width: 320
  # Required: Object detection model input height (default: shown below)
  height: 320
  # Optional: Label name modifications. These are merged into the standard labelmap.
#  labelmap:
#    2: vehicle

# Optional: ffmpeg configuration
ffmpeg:
  # Optional: global ffmpeg args (default: shown below)
  global_args: -hide_banner -loglevel warning
  # Optional: global hwaccel args (default: shown below)
  # NOTE: See hardware acceleration docs for your specific device
  hwaccel_args: -c:v h264_cuvid
  # Optional: global input args (default: shown below)
  input_args: -avoid_negative_ts make_zero -fflags +genpts+discardcorrupt -rtsp_transport tcp -timeout 5000000 -use_wallclock_as_timestamps 1
  # Optional: global output args
  output_args:
    # Optional: output args for detect streams (default: shown below)
    detect: -f rawvideo -pix_fmt yuv420p
    # Optional: output args for record streams (default: shown below)
    record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c copy -an
    # Optional: output args for rtmp streams (default: shown below)
    rtmp: -c copy -f flv

# Optional: Detect configuration
# NOTE: Can be overridden at the camera level
detect:
  # Optional: width of the frame for the input with the detect role (default: shown below)
  width: 704
  # Optional: height of the frame for the input with the detect role (default: shown below)
  height: 576
  # Optional: desired fps for your camera for the input with the detect role (default: shown below)
  # NOTE: Recommended value of 5. Ideally, try and reduce your FPS on the camera.
  fps: 5
  # Optional: enables detection for the camera (default: True)
  # This value can be set via MQTT and will be updated in startup based on retained value
  enabled: True
  # Optional: Number of frames without a detection before frigate considers an object to be gone. (default: 5x the frame rate)
  max_disappeared: 25
  # Optional: Configuration for stationary object tracking
  stationary:
    # Optional: Frequency for confirming stationary objects (default: shown below)
    # When set to 0, object detection will not confirm stationary objects until movement is detected.
    # If set to 10, object detection will run to confirm the object still exists on every 10th frame.
    interval: 0
    # Optional: Number of frames without a position change for an object to be considered stationary (default: 10x the frame rate or 10s)
    threshold: 50
    # Optional: Define a maximum number of frames for tracking a stationary object (default: not set, track forever)
    # This can help with false positives for objects that should only be stationary for a limited amount of time.
    # It can also be used to disable stationary object tracking. For example, you may want to set a value for person, but leave
    # car at the default.
    # WARNING: Setting these values overrides default behavior and disables stationary object tracking.
    #          There are very few situations where you would want it disabled. It is NOT recommended to
    #          copy these values from the example config into your config unless you know they are needed.
    max_frames:
      # Optional: Default for all object types (default: not set, track forever)
      default: 3000
      # Optional: Object specific values
      objects:
        person: 1000

# Optional: Record configuration
# NOTE: Can be overridden at the camera level
record:
  enabled: True
  # Optional: Number of minutes to wait between cleanup runs (default: shown below)
  # This can be used to reduce the frequency of deleting recording segments from disk if you want to minimize i/o
  expire_interval: 60
  # Optional: Retention settings for recording
  retain:
    # Optional: Number of days to retain recordings regardless of events (default: shown below)
    # NOTE: This should be set to 0 and retention should be defined in events section below
    #       if you only want to retain recordings of events.
    days: 0
    # Optional: Mode for retention. Available options are: all, motion, and active_objects
    #   all - save all recording segments regardless of activity
    #   motion - save all recordings segments with any detected motion
    #   active_objects - save all recording segments with active/moving objects
    # NOTE: this mode only applies when the days setting above is greater than 0
    mode: all
  # Optional: Event recording settings
  events:
    # Optional: Number of seconds before the event to include (default: shown below)
    pre_capture: 6
    # Optional: Number of seconds after the event to include (default: shown below)
    post_capture: 6
    # Optional: Objects to save recordings for. (default: all tracked objects)
    objects:
      - person
      - car
      - bus
      - dog
      - cat
      - motorcycle
      - bicycle
    # Optional: Restrict recordings to objects that entered any of the listed zones (default: no required zones)
    required_zones: []
    # Optional: Retention settings for recordings of events
    retain:
      # Required: Default retention days (default: shown below)
      default: 1
      # Optional: Mode for retention. (default: shown below)
      #   all - save all recording segments for events regardless of activity
      #   motion - save all recordings segments for events with any detected motion
      #   active_objects - save all recording segments for event with active/moving objects
      #
      # NOTE: If the retain mode for the camera is more restrictive than the mode configured
      #       here, the segments will already be gone by the time this mode is applied.
      #       For example, if the camera retain mode is "motion", the segments without motion are
      #       never stored, so setting the mode to "all" here won't bring them back.
      mode: active_objects

# Optional: Configuration for the jpg snapshots written to the clips directory for each event
# NOTE: Can be overridden at the camera level
snapshots:
  # Optional: Enable writing jpg snapshot to /media/frigate/clips (default: shown below)
  # This value can be set via MQTT and will be updated in startup based on retained value
  enabled: True
  # Optional: save a clean PNG copy of the snapshot image (default: shown below)
  clean_copy: True
  # Optional: print a timestamp on the snapshots (default: shown below)
  timestamp: True
  # Optional: draw bounding box on the snapshots (default: shown below)
  bounding_box: True
  # Optional: crop the snapshot (default: shown below)
  crop: False
  # Optional: height to resize the snapshot to (default: original size)
#  height: 175
  # Optional: Camera override for retention settings (default: global values)
  retain:
    # Required: Default retention days (default: shown below)
    default: 1

# Required
cameras:
  Foyer:
    ffmpeg:
      inputs:
        - path: rtsp://XXXX@XXXX:554/cam/realmonitor?channel=1&subtype=1
          roles:
            - detect
        - path: rtsp://XXXX:XXXX@XXXX:554/cam/realmonitor?channel=1&subtype=0
          roles:
            - record
    # Optional: timeout for highest scoring image before allowing it
    # to be replaced by a newer image. (default: shown below)
    best_image_timeout: 60
    objects:
      track:
        - person
        - car
        - dog
        - cat
        - motorcycle
        - bus
        - bicycle
      filters:
        car:
          threshold: 0.65
    detect:
      fps: 8
      # Optional: Number of frames without a detection before frigate considers an object to be gone. (default: 5x the frame rate)
      max_disappeared: 40
      # Optional: Configuration for stationary object tracking
      stationary:
        threshold: 80

docker-compose file or Docker CLI command

version: '3.9'

services:
  frigate:
    container_name: frigate
    privileged: true # this may not be necessary for all setups
    image: blakeblackshear/frigate:stable
    restart: unless-stopped
    runtime: nvidia
    devices:
      - /dev/bus/usb:/dev/bus/usb
      - "/root/frigate/config/config.yml:/config/config.yml:ro"
      - "/etc/localtime:/etc/localtime:ro"
      - "/root/frigate/config/spaghettinet_edgetpu_l_compiled.tflite:/spaghettinet_edgetpu_l_compiled.tflite"
      - type: tmpfs # Optional: 1GB of memory, reduces SSD/SD Card wear
        target: /tmp/cache
        tmpfs:
          size: 2000000000
    deploy:    # <------------- Add this section
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    ports:
      - '5000:5000'
      - "1935:1935" # RTMP feeds
    environment:
      - FRIGATE_RTSP_PASSWORD='XXXX
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
      - NVIDIA_VISIBLE_DEVICES=all

Relevant log output

[2022-10-20 16:23:57] frigate.video                  ERROR   : GateExtra: Unable to read frames from ffmpeg process.
[2022-10-20 16:23:57] frigate.video                  ERROR   : GateExtra: ffmpeg process is not running. exiting capture thread...
[2022-10-20 16:24:05] watchdog.Foyer                 ERROR   : Ffmpeg process crashed unexpectedly for Foyer.
[2022-10-20 16:24:05] watchdog.Foyer                 ERROR   : The following ffmpeg logs include the last 100 lines prior to exit.
[2022-10-20 16:24:05] ffmpeg.Foyer.detect            ERROR   : [AVHWDeviceContext @ 0x55bf99883d00] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
[2022-10-20 16:24:05] ffmpeg.Foyer.detect            ERROR   : Error while opening decoder for input stream #0:0 : Generic error in an external library
[2022-10-20 16:24:05] watchdog.GateExtra             ERROR   : Ffmpeg process crashed unexpectedly for GateExtra.
[2022-10-20 16:24:05] watchdog.GateExtra             ERROR   : The following ffmpeg logs include the last 100 lines prior to exit.
[2022-10-20 16:24:05] ffmpeg.GateExtra.detect        ERROR   : [AVHWDeviceContext @ 0x556614feafc0] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
[2022-10-20 16:24:05] ffmpeg.GateExtra.detect        ERROR   : Error while opening decoder for input stream #0:0 : Generic error in an external library
[2022-10-20 16:24:05] watchdog.Gate                  ERROR   : Ffmpeg process crashed unexpectedly for Gate.
[2022-10-20 16:24:05] watchdog.Gate                  ERROR   : The following ffmpeg logs include the last 100 lines prior to exit.
[2022-10-20 16:24:05] ffmpeg.Gate.detect             ERROR   : [AVHWDeviceContext @ 0x56102df20380] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
[2022-10-20 16:24:05] ffmpeg.Gate.detect             ERROR   : Error while opening decoder for input stream #0:0 : Generic error in an external library
[2022-10-20 16:24:05] watchdog.MainDoor              ERROR   : Ffmpeg process crashed unexpectedly for MainDoor.
[2022-10-20 16:24:05] watchdog.MainDoor              ERROR   : The following ffmpeg logs include the last 100 lines prior to exit.
[2022-10-20 16:24:05] ffmpeg.MainDoor.detect         ERROR   : [AVHWDeviceContext @ 0x5627335c84c0] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
[2022-10-20 16:24:05] ffmpeg.MainDoor.detect         ERROR   : Error while opening decoder for input stream #0:0 : Generic error in an external library
[2022-10-20 16:24:05] watchdog.FishPond              ERROR   : Ffmpeg process crashed unexpectedly for FishPond.
[2022-10-20 16:24:05] watchdog.FishPond              ERROR   : The following ffmpeg logs include the last 100 lines prior to exit.
[2022-10-20 16:24:05] ffmpeg.FishPond.detect         ERROR   : [AVHWDeviceContext @ 0x55bfa165da00] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
[2022-10-20 16:24:05] watchdog.Porsche               ERROR   : Ffmpeg process crashed unexpectedly for Porsche.
[2022-10-20 16:24:05] watchdog.Porsche               ERROR   : The following ffmpeg logs include the last 100 lines prior to exit.
[2022-10-20 16:24:05] ffmpeg.Porsche.detect          ERROR   : [AVHWDeviceContext @ 0x559a985f0dc0] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
[2022-10-20 16:24:05] ffmpeg.Porsche.detect          ERROR   : Error while opening decoder for input stream #0:0 : Generic error in an external library
[2022-10-20 16:24:05] ffmpeg.FishPond.detect         ERROR   : Error while opening decoder for input stream #0:0 : Generic error in an external library
[2022-10-20 16:24:05] watchdog.Kitchen               ERROR   : Ffmpeg process crashed unexpectedly for Kitchen.

FFprobe output from your camera

ffprobe version N-108766-geb9153b4a7 Copyright (c) 2007-2022 the FFmpeg developers
  built with gcc 10 (Debian 10.2.1-6)
  configuration: --enable-nonfree --enable-cuda-nvcc --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --disable-static --enable-shared
  libavutil      57. 39.101 / 57. 39.101
  libavcodec     59. 51.100 / 59. 51.100
  libavformat    59. 34.101 / 59. 34.101
  libavdevice    59.  8.101 / 59.  8.101
  libavfilter     8. 49.101 /  8. 49.101
  libswscale      6.  8.112 /  6.  8.112
  libswresample   4.  9.100 /  4.  9.100
Input #0, rtsp, from 'rtsp://XXXX:XXXX@XXXX:554':
  Metadata:
    title           : RTSP Session/2.0
  Duration: N/A, start: 0.060000, bitrate: N/A
  Stream #0:0: Video: h264 (High), yuv420p(progressive), 1280x960, 100 tbr, 90k tbn

Operating system

Debian

Install method

Docker Compose

Network connection

Wired

Camera make and model

Dahua

Any other information that may be helpful

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:04:00.0 Off |                  N/A |
| 63%   48C    P0    27W / 105W |      0MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Have tried almost all proposed solutions from other similar error, but still can't resolve it.

yeahme49 commented 2 years ago

Is your proxmox container privileged or unprivileged? Did you edit the conf file for the container to mount the dev entries and set the cgroup2 settings for the gpu?

weitheng commented 2 years ago

Is your proxmox container privileged or unprivileged? Did you edit the conf file for the container to mount the dev entries and set the cgroup2 settings for the gpu?

It's privileged. Sorry, forgot to include the lxc configuration file:-

arch: amd64
cores: 8
features: keyctl=1,nesting=1
hostname: portainer1
memory: 16384
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=76:0C:0A:B3:5B:97,ip=dhcp,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-101-disk-0,size=120G
swap: 4096
unprivileged: 0
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/bus/usb/001 dev/bus/usb/001 none bind,optional,create=dir 0,0
lxc.mount.entry: /dev/bus/usb/003 dev/bus/usb/003 none bind,optional,create=dir 0,0
lxc.mount.entry: /dev/bus/usb/004 dev/bus/usb/004 none bind,optional,create=dir 0,0
lxc.mount.entry: /dev/bus/usb/005 dev/bus/usb/005 none bind,optional,create=dir 0,0
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 243:* rwm
lxc.apparmor.profile: unconfined
lxc.cgroup2.devices.allow: a
lxc.cap.drop:
lxc.mount.auto: cgroup:rw
weitheng commented 2 years ago

Anyone able to help? 🙏

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

wisemonkey commented 1 year ago

How was this fixed? I see same issue with NVIDIA P40 - in my case just needed proper drivers

staridiot commented 3 months ago

I'm also experiencing this with GrumpyMeow's Proxmox LXC install, which I believe was recently approved as an official installation method. This should probably be re-opened to investigate more.

mohdsm81 commented 1 month ago

I am still having the same issue! Screenshot from 2024-09-17 23-52-40