blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
17.71k stars 1.62k forks source link

[Detector Support]: Doesn't work GPU with OpenVINO #9574

Closed Shlyakhoff closed 3 months ago

Shlyakhoff commented 6 months ago

Describe the problem you are having

When I am trying to enable OpenVINO detector with device=GPU I can see the error in log and Frigate becomes inactive for a while. If I set device=AUTO it starts to work, but when detect is enabled I can see high cpu utilization and no any activity in intel_gpu_top which shows that GPU acceleration is not working. HW acceleration of camera is working good.

I set LIBVA_DRIVER_NAME=iHD because of if I choose i965 then I see how the blitter in intel_gpu_top is connected, I'm not sure, but it doesn't seem good and openvino doesn't work also.

I have Celeron J4105 and Unraid system.

Version

0.13

Frigate config file

database:
  path: /config/frigate.db

mqtt:
  enabled: true
  host: homeassistant.local
  port: 1883
  user: admin
  password: password
  topic_prefix: frigate
  client_id: frigate

ui:
  live_mode: mse
  use_experimental: true

birdseye:
  enabled: false

detect:
  enabled: true

detectors:
  ov:
    type: openvino
    device: GPU
    model:
      path: /openvino-model/ssdlite_mobilenet_v2.xml

model:
  width: 300
  height: 300
  input_tensor: nhwc
  input_pixel_format: bgr
  labelmap_path: /openvino-model/coco_91cl_bkgr.txt

snapshots:
  enabled: true

record:
  enabled: true
  expire_interval: 60
  retain:
    days: 12
    mode: all
  events:
    retain:
      default: 12
      mode: motion

ffmpeg:
  hwaccel_args: preset-vaapi

logger:
  default: error
  logs:
    frigate.mqtt: error
    frigate.app: error
    frigate.ffmpeg: critical

go2rtc:
  log:
    format: text
    level: error
#    api: trace
#    exec: debug
#    ngrok: info
#    rtsp: warn
#    streams: error
#    webrtc: fatal
  rtsp:
    default_query: mp4
  streams:
    cam3:
    - rtsp://admin:password@192.168.1.62:554
    cam3_sub:
    - rtsp://admin:password@192.168.1.62:554/Streaming/channels/2

cameras:
  cam3:
    ffmpeg:
      inputs:
      - path: rtsp://127.0.0.1:8554/cam3
        input_args: preset-rtsp-restream-low-latency
        roles:
        - record
      - path: rtsp://127.0.0.1:8554/cam3_sub
        input_args: preset-rtsp-restream
        roles:
        - detect
      output_args:
        record: preset-record-generic-audio-copy
    motion:
      mask:
      - 0,176,110,176,110,155,0,155

docker-compose file or Docker CLI command

docker run
  -d
  --name='frigate'
  --net='bridge'
  --privileged=true
  -e TZ="Europe/Moscow"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="SERVER"
  -e HOST_CONTAINERNAME="frigate"
  -e 'FRIGATE_RTSP_PASSWORD'='password'
  -e 'LIBVA_DRIVER_NAME'='iHD'
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:5000]'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/yayitazale/unraid-templates/main/frigate.png'
  -p '5000:5000/tcp'
  -p '8554:8554/tcp'
  -p '8555:8555/tcp'
  -p '8555:8555/udp'
  -p '1984:1984/tcp'
  -v '/mnt/user/appdata/frigate':'/config':'rw'
  -v '/mnt/disks/cameras/':'/media/frigate':'rw,slave'
  -v '/etc/localtime':'/etc/localtime':'rw'
  --device='/dev/dri'
  --shm-size=256mb
  --mount type=tmpfs,target=/tmp/cache,tmpfs-size=2726297600
  --restart unless-stopped 'ghcr.io/blakeblackshear/frigate:stable'

Relevant log output

2024-02-01 19:33:04.112890288  [INFO] Preparing Frigate...
2024-02-01 19:33:04.140557029  [INFO] Starting Frigate...
2024-02-01 19:33:06.708554578  [2024-02-01 19:33:06] frigate.app                    INFO    : Starting Frigate (0.13.0-01e2d20)
2024-02-01 19:33:10.350863638  [2024-02-01 19:33:10] frigate.config                 WARNING : Customizing more than a detector model path is unsupported.
2024-02-01 19:33:12.743140257  Process detector:ov:
2024-02-01 19:33:12.746019189  Traceback (most recent call last):
2024-02-01 19:33:12.746050491    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-02-01 19:33:12.746053039      self.run()
2024-02-01 19:33:12.746055072    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-02-01 19:33:12.746061394      self._target(*self._args, **self._kwargs)
2024-02-01 19:33:12.746064435    File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
2024-02-01 19:33:12.746091310      object_detector = LocalObjectDetector(detector_config=detector_config)
2024-02-01 19:33:12.746111650    File "/opt/frigate/frigate/object_detection.py", line 53, in __init__
2024-02-01 19:33:12.746114358      self.detect_api = create_detector(detector_config)
2024-02-01 19:33:12.746116276    File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
2024-02-01 19:33:12.746117776      return api(detector_config)
2024-02-01 19:33:12.746119702    File "/opt/frigate/frigate/detectors/plugins/openvino.py", line 32, in __init__
2024-02-01 19:33:12.746121416      self.interpreter = self.ov_core.compile_model(
2024-02-01 19:33:12.746123545    File "/usr/local/lib/python3.9/dist-packages/openvino/runtime/ie_api.py", line 399, in compile_model
2024-02-01 19:33:12.746126909      super().compile_model(model, device_name, {} if config is None else config),
2024-02-01 19:33:12.746181038  RuntimeError: cldnn program build failed! [GPU] clWaitForEvents, error code: -14

Operating system

UNRAID

Install method

Docker Compose

Coral version

CPU (no coral)

Any other information that may be helpful

No response

NickM-27 commented 6 months ago

@NateMeyer have you seen this before?

Shlyakhoff commented 6 months ago

No, I just try to setup it. Actually no, it seems I tried a month ago to setup OpenVINO when Frigate has ver 0,12 and it was the same.

Shlyakhoff commented 6 months ago

This is I can see when set device=AUTO, but as I mentioned earlier, it doesn't work when detect is enabled.

image

image

NickM-27 commented 6 months ago

to be clear I tagged someone else with that question

NateMeyer commented 6 months ago

No that is a new one to me. I'll see what I can dig up later today.

Shlyakhoff commented 6 months ago

It looks like the container is missing something like this one

NickM-27 commented 6 months ago

That page is outdated. Also, many users are using this in GPU mode so it will likely be something host specific

Shlyakhoff commented 6 months ago

Clear. Let me please know if I need to share more information about the case.

Shlyakhoff commented 6 months ago

I just wanted to add that I found a message in syslog when I am trying to launch a Frigate with Openvino and device GPU image

Shlyakhoff commented 6 months ago

it seems that the error occurs that there is some OpenCL bag in kernel because of there are similar messages on the Internet related to the incorrect work of hardware acceleration for example in Plex. Hopefully, a solution to this problem will be found.

dsolva commented 6 months ago

I have the exact same issue, same cpu in a nuc but with proxmox.

Indeed Plex struggled for a long time with some of these processors after an update but was resolved a few weeks back. As far as i know their issue was related to changes in the drivers used.

jjak0b commented 5 months ago

Same issue on a docker nested container inside a proxmox unprivileged container using intel N3350. Does the container and host both need some packages ? if so then which are needed ?

a-bali commented 5 months ago

I seem to have the very same issue, is there any solution already?

Shlyakhoff commented 4 months ago

I seem to have the very same issue, is there any solution already?

No, I didn't find and just bought Google Coral TPU

henryouly commented 4 months ago

I have intel J4105 with Proxmox 8.1 / debian 12 docker LXC running into the same "GPU Hang" syslog. Reading a similar issue #5799 that suggests some kernel issue, I eventually replace Proxmox with 7.4 and debian 11 LXC, and the issue is resolved. My kernel in LXC is 5.15.102-1-pve. I think the kernel in Proxmox 8.1 is probably 6.5.11-8-pve.

Zanadar commented 4 months ago

Same issue here with proxmox 8.1.10 (kernel 6.5.13-5-pve) and frigate running in LXC

esand commented 4 months ago

Same issue here with proxmox 8.1.10 (kernel 6.5.13-5-pve) and frigate running in LXC

I am running Proxmox as well and it was working on kernel 6.5.13-3. I just updated to Proxmox 8.2.2 today and it uses kernel 6.8.4-2 and Frigate is unable to detect my GPU:

2024-04-24 16:31:55.145428974  Process detector:ov:
2024-04-24 16:31:55.148000145  Traceback (most recent call last):
2024-04-24 16:31:55.148038397    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-04-24 16:31:55.148044573      self.run()
2024-04-24 16:31:55.148051060    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-04-24 16:31:55.148063664      self._target(*self._args, **self._kwargs)
2024-04-24 16:31:55.148072342    File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
2024-04-24 16:31:55.148131608      object_detector = LocalObjectDetector(detector_config=detector_config)
2024-04-24 16:31:55.148163715    File "/opt/frigate/frigate/object_detection.py", line 53, in __init__
2024-04-24 16:31:55.148169848      self.detect_api = create_detector(detector_config)
2024-04-24 16:31:55.148175713    File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
2024-04-24 16:31:55.148180348      return api(detector_config)
2024-04-24 16:31:55.148185865    File "/opt/frigate/frigate/detectors/plugins/openvino.py", line 32, in __init__
2024-04-24 16:31:55.148190875      self.interpreter = self.ov_core.compile_model(
2024-04-24 16:31:55.148196953    File "/usr/local/lib/python3.9/dist-packages/openvino/runtime/ie_api.py", line 399, in compile_model
2024-04-24 16:31:55.148205619      super().compile_model(model, device_name, {} if config is None else config),
2024-04-24 16:31:55.148213584  RuntimeError: Failed to create plugin /usr/local/lib/python3.9/dist-packages/openvino/libs/libopenvino_intel_gpu_plugin.so for device GPU
2024-04-24 16:31:55.148286612  Please, check your environment
2024-04-24 16:31:55.148294444  Check 'error_code == 0' failed at src/plugins/intel_gpu/src/runtime/ocl/ocl_device_detector.cpp:194:
2024-04-24 16:31:55.148303564  [GPU] No supported OCL devices found or unexpected error happened during devices query.
2024-04-24 16:31:55.148309311  [GPU] Please check OpenVINO documentation for GPU drivers setup guide.
2024-04-24 16:31:55.148365420  [GPU] clGetPlatformIDs error code: -1001

Nothing has changed in my config, and it's rather simple for detectors:

detectors:
  ov:
    type: openvino
    device: GPU

I'm guessing it's something to do with how the kernel may be exposing devices, and/or with an update required to some libs in the Frigate container?

FYI - if I change to device: AUTO, it works fine. I can't tell if it's actually using the GPU or not since vainfo doesn't work for me (no privs), but I'm guessing not... inference is up at around double what it used to be.

Zanadar commented 4 months ago

@esand I think it is most likely that your issue is not related to frigate. You probably need to reconfigure the GPU passthrough to LXC/VM after Proxmox upgrade.

esand commented 4 months ago

@esand I think it is most likely that your issue is not related to frigate. You probably need to reconfigure the GPU passthrough to LXC/VM after Proxmox upgrade.

The /dev/dri devices are still visible in both the linux container and the frigate container. Permissions are correct and I've still got hwaccel working just fine. As far as I'm aware, no changes in Proxmox 8.2.2 impacted hardware passthrough configurations and my LXC still boots up just fine (it would error out on a bad config). I also have other devices that I do passthrough with in other containers and those are still functioning fine.

dsolva commented 4 months ago

Same observation as @esand. Hwaccel working fine and other cotainers working with the gpu (e.g. plex transcoding).

Zanadar commented 4 months ago

I just upgraded proxmox to 8.2.2 and have the same issue as described above with intel gpu passthrough

2024-04-26 10:58:11.974474134 [2024-04-26 10:58:11] detector.ov INFO : Starting detection process: 306 2024-04-26 10:58:12.048736394 Process detector:ov: 2024-04-26 10:58:12.049568318 Traceback (most recent call last): 2024-04-26 10:58:12.049570624 File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap 2024-04-26 10:58:12.049571729 self.run() 2024-04-26 10:58:12.049572957 File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run 2024-04-26 10:58:12.049574044 self._target(*self._args, **self._kwargs) 2024-04-26 10:58:12.049575166 File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector 2024-04-26 10:58:12.049576263 object_detector = LocalObjectDetector(detector_config=detector_config) 2024-04-26 10:58:12.049594351 File "/opt/frigate/frigate/object_detection.py", line 53, in init 2024-04-26 10:58:12.049595625 self.detect_api = create_detector(detector_config) 2024-04-26 10:58:12.049596786 File "/opt/frigate/frigate/detectors/init.py", line 18, in create_detector 2024-04-26 10:58:12.049613578 return api(detector_config) 2024-04-26 10:58:12.049614822 File "/opt/frigate/frigate/detectors/plugins/openvino.py", line 32, in init 2024-04-26 10:58:12.049615982 self.interpreter = self.ov_core.compile_model( 2024-04-26 10:58:12.049617125 File "/usr/local/lib/python3.9/dist-packages/openvino/runtime/ie_api.py", line 399, in compile_model 2024-04-26 10:58:12.049628693 super().compile_model(model, device_name, {} if config is None else config), 2024-04-26 10:58:12.049829684 RuntimeError: Failed to create plugin /usr/local/lib/python3.9/dist-packages/openvino/libs/libopenvino_intel_gpu_plugin.so for device GPU 2024-04-26 10:58:12.049831211 Please, check your environment 2024-04-26 10:58:12.049832366 Check 'error_code == 0' failed at src/plugins/intel_gpu/src/runtime/ocl/ocl_device_detector.cpp:194: 2024-04-26 10:58:12.049833459 [GPU] No supported OCL devices found or unexpected error happened during devices query. 2024-04-26 10:58:12.049834541 [GPU] Please check OpenVINO documentation for GPU drivers setup guide. 2024-04-26 10:58:12.049835571 [GPU] clGetPlatformIDs error code: -1001

Zanadar commented 4 months ago

looks like this is an issue with the kernel included in the new proxmox 8.2.2 https://github.com/blakeblackshear/frigate/discussions/10785

loading the previous kernel in proxmox with the following guide solved my issue until a new proxmox release comes out. https://engineerworkshop.com/blog/how-to-revert-a-proxmox-kernel-update/

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

luke3butler commented 1 month ago

Not sure when this was fixed, but I just commented out the GRUB_DEFAULT setting I had in place to use the previous kernel, ran update-grub, rebooted, and everything is working as it should.

I'm on version 6.8.8-3-pve now.

dsolva commented 1 month ago

Not sure when this was fixed, but I just commented out the GRUB_DEFAULT setting I had in place to use the previous kernel, ran update-grub, rebooted, and everything is working as it should.

I'm on version 6.8.8-3-pve now.

Just updated to 6.8.8-3-pve to test and no success. Do you have similar setup that OP?

luke3butler commented 1 month ago

Not sure when this was fixed, but I just commented out the GRUB_DEFAULT setting I had in place to use the previous kernel, ran update-grub, rebooted, and everything is working as it should. I'm on version 6.8.8-3-pve now.

Just updated to 6.8.8-3-pve to test and no success. Do you have similar setup that OP?

Yes, similar to OP. The host is Proxmox 8 with an i7-8700 CPU.

I was experiencing the same exact issue, prior to updating the kernel to 6.8.8-3. I've rebooted several times, verified that the newer kernel was actually being used, and haven't experienced any issues.

ffmpeg:
  hwaccel_args: preset-intel-qsv-h264

detectors:
  ov:
    type: openvino
    device: GPU
    model:
      path: /openvino-model/ssdlite_mobilenet_v2.xml
Fahmula commented 1 month ago

I recently updated to proxmox 8.2.4 with kernel 6.8.8-4 from 7.4 with 5.15.158-1-pve I'm experiencing the same issue. I tried kernel 6.8.8-3 and even 6.5 with no luck.

I decided to set up a proxmox 7.4 VM just to test it and it works perfectly. I tried the what was suggested here #12266 on proxmox 8 but it didn't solve my issue.

henryouly commented 3 weeks ago

I'm pretty sure this is an upstream issue in the compatibility between intel-compute-engine and the kernel GPU hang check functionality.

Here is a related discussion with the exact same CPU (J4105) https://github.com/intel/compute-runtime/issues/679

As far as I'm aware of, downgrading to kernel 5.15 seems to be the only solution.

esand commented 3 weeks ago

As far as I'm aware of, downgrading to kernel 5.15 seems to be the only solution.

I don't believe that intel/compute-runtime#679 is the culprit, but rather intel/compute-runtime#710. If you put some ENV variables in to override some GPU settings it works, or if you update the openvino libraries (#10785).

There's supposedly a fix in the works to the kernel code to correct the issue, but until then either ENV variables or updating openvino appear to solve the problem.

It might be best to close this and other related issues and point them all to #10785 which documents both potential fixes.

Fahmula commented 3 weeks ago

As far as I'm aware of, downgrading to kernel 5.15 seems to be the only solution.

I don't believe that intel/compute-runtime#679 is the culprit, but rather intel/compute-runtime#710. If you put some ENV variables in to override some GPU settings it works, or if you update the openvino libraries (#10785).

There's supposedly a fix in the works to the kernel code to correct the issue, but until then either ENV variables or updating openvino appear to solve the problem.

It might be best to close this and other related issues and point them all to #10785 which documents both potential fixes.

None of these solutions works for me. It seems my J4125 just isn't supported.

henryouly commented 3 weeks ago

I don't believe that intel/compute-runtime#679 is the culprit, but rather intel/compute-runtime#710. If you put some ENV variables in to override some GPU settings it works, or if you update the openvino libraries (#10785).

There's supposedly a fix in the works to the kernel code to correct the issue, but until then either ENV variables or updating openvino appear to solve the problem.

It might be best to close this and other related issues and point them all to #10785 which documents both potential fixes.

The issue you mentioned is about unable to detect GPU, which I think #10785 is the right thread to merge with. OP, @Fahmula and myself experienced a different one. The one is related to J4105/J4125 specifically, and relevant logs are clearly different than the one you posted in https://github.com/blakeblackshear/frigate/issues/9574#issuecomment-2075805849. As @Fahmula mentioned, none of the solutions work.

esand commented 3 weeks ago

@Fahmula and myself experienced a different one. The one is related to J4105/J4125 specifically, and relevant logs are clearly different than the one you posted

My apologies - it seems you are indeed correct. I think what suckered me in to posting on this thread was that my error was almost identical to OP and I thought they were the same thing initially.