blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
18.46k stars 1.68k forks source link

[Support]: 0.13.0 beta 4 no Intel GPU stats #8494

Closed wimb0 closed 10 months ago

wimb0 commented 11 months ago

Describe the problem you are having

Since updating from beta 3 to beta 4, the Intel GPU stats do not work anymore:

ERROR : Unable to poll intel GPU stats: Failed to initialize PMU! (Permission denied)

I am using the Frigate Beta (0.13.0) Addon on Home Assistant supervised on Debian.

Version

0.13.0-65E3E67

Frigate config file

not relevant for this issue

Relevant log output

2023-11-04 17:44:25.786771029  [INFO] Preparing Frigate...
2023-11-04 17:44:25.810482432  [INFO] Starting Frigate...
2023-11-04 17:44:27.147721739  [2023-11-04 18:44:27] frigate.app                    INFO    : Starting Frigate (0.13.0-65e3e67)
2023-11-04 17:44:27.147873691  [2023-11-04 18:44:27] frigate.app                    INFO    : Creating directory: /tmp/cache
2023-11-04 17:44:27.204947621  [2023-11-04 18:44:27] peewee_migrate.logs            INFO    : Starting migrations
2023-11-04 17:44:27.208710254  [2023-11-04 18:44:27] peewee_migrate.logs            INFO    : There is nothing to migrate
2023-11-04 17:44:27.212698918  [2023-11-04 18:44:27] frigate.app                    INFO    : Recording process started: 378
2023-11-04 17:44:27.215114151  [2023-11-04 18:44:27] frigate.app                    INFO    : go2rtc process pid: 89
2023-11-04 17:44:27.232991121  [2023-11-04 18:44:27] detector.coral                 INFO    : Starting detection process: 387
2023-11-04 17:44:30.005784472  [2023-11-04 18:44:27] frigate.app                    INFO    : Output process started: 390
2023-11-04 17:44:30.014416450  [2023-11-04 18:44:27] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as usb
2023-11-04 17:44:30.014684464  [2023-11-04 18:44:27] frigate.app                    INFO    : Camera processor started for reolink_deurbel: 404
2023-11-04 17:44:30.014925223  [2023-11-04 18:44:30] frigate.detectors.plugins.edgetpu_tfl INFO    : TPU found
2023-11-04 17:44:30.015400991  [2023-11-04 18:44:27] frigate.app                    INFO    : Camera processor started for ipcam_oprit: 405
2023-11-04 17:44:30.015411171  [2023-11-04 18:44:27] frigate.app                    INFO    : Camera processor started for ipcam_tuin: 407
2023-11-04 17:44:30.015731922  [2023-11-04 18:44:27] frigate.app                    INFO    : Capture process started for reolink_deurbel: 409
2023-11-04 17:44:30.015785670  [2023-11-04 18:44:27] frigate.app                    INFO    : Capture process started for ipcam_oprit: 413
2023-11-04 17:44:30.015979624  [2023-11-04 18:44:27] frigate.app                    INFO    : Capture process started for ipcam_tuin: 417
2023-11-04 17:44:31.571857260  [2023-11-04 18:44:31] frigate.util.services          ERROR   : Unable to poll intel GPU stats: Failed to initialize PMU! (Permission denied)
2023-11-04 17:44:31.571860202  
2023-11-04 17:45:37.534839625  [2023-11-04 18:45:37] frigate.util.services          ERROR   : Unable to poll intel GPU stats: Failed to initialize PMU! (Permission denied)
2023-11-04 17:45:37.534848516  
2023-11-04 23:37:18.031637270  [2023-11-05 00:37:18] frigate.watchdog               INFO    : Detection appears to be stuck. Restarting detection process...
2023-11-04 23:37:18.032316377  [2023-11-05 00:37:18] root                           INFO    : Waiting for detection process to exit gracefully...
2023-11-04 23:37:48.061990011  [2023-11-05 00:37:48] root                           INFO    : Detection process didnt exit. Force killing...
2023-11-04 23:37:48.068776160  [2023-11-05 00:37:48] root                           INFO    : Detection process has exited...
2023-11-04 23:37:48.094513235  [2023-11-05 00:37:48] detector.coral                 INFO    : Starting detection process: 53723
2023-11-04 23:37:50.755796179  [2023-11-05 00:37:48] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as usb
2023-11-04 23:37:50.766927246  [2023-11-05 00:37:50] frigate.detectors.plugins.edgetpu_tfl INFO    : TPU found
2023-11-05 01:00:16.029955045  [2023-11-05 02:00:16] frigate.video                  ERROR   : reolink_deurbel: Unable to read

VAINFO for GPU

{"return_code":0,"stderr":"","stdout":"vainfo:VA-APIversion:1.17(libva2.10.0)nvainfo:Driverversion:InteliHDdriverforIntel(R)GenGraphics-23.1.1()nvainfo:SupportedprofileandentrypointsnVAProfileNone:tVAEntrypointVideoProcnVAProfileNone:tVAEntrypointStatsnVAProfileMPEG2Simple:tVAEntrypointVLDnVAProfileMPEG2Simple:tVAEntrypointEncSlicenVAProfileMPEG2Main:tVAEntrypointVLDnVAProfileMPEG2Main:tVAEntrypointEncSlicenVAProfileH264Main:tVAEntrypointVLDnVAProfileH264Main:tVAEntrypointEncSlicenVAProfileH264Main:tVAEntrypointFEInVAProfileH264Main:tVAEntrypointEncSliceLPnVAProfileH264High:tVAEntrypointVLDnVAProfileH264High:tVAEntrypointEncSlicenVAProfileH264High:tVAEntrypointFEInVAProfileH264High:tVAEntrypointEncSliceLPnVAProfileVC1Simple:tVAEntrypointVLDnVAProfileVC1Main:tVAEntrypointVLDnVAProfileVC1Advanced:tVAEntrypointVLDnVAProfileJPEGBaseline:tVAEntrypointVLDnVAProfileJPEGBaseline:tVAEntrypointEncPicturenVAProfileH264ConstrainedBaseline:tVAEntrypointVLDnVAProfileH264ConstrainedBaseline:tVAEntrypointEncSlicenVAProfileH264ConstrainedBaseline:tVAEntrypointFEInVAProfileH264ConstrainedBaseline:tVAEntrypointEncSliceLPnVAProfileVP8Version0_3:tVAEntrypointVLDnVAProfileVP8Version0_3:tVAEntrypointEncSlicenVAProfileHEVCMain:tVAEntrypointVLDnVAProfileHEVCMain:tVAEntrypointEncSlicenVAProfileHEVCMain:tVAEntrypointFEInVAProfileHEVCMain10:tVAEntrypointVLDnVAProfileHEVCMain10:tVAEntrypointEncSlicenVAProfileVP9Profile0:tVAEntrypointVLDnVAProfileVP9Profile2:tVAEntrypointVLD"}

Frigate stats

No response

Operating system

Debian

Install method

HassOS Addon

Coral version

USB

Network connection

Wired

Camera make and model

Annke C500 and Reolink Doorbell

Any other information that may be helpful

No response

NickM-27 commented 11 months ago

This is most likely related to https://github.com/blakeblackshear/frigate-hass-addons/pull/122

Other users saw that it was still working as expected. Which addon variant are you running?

wimb0 commented 11 months ago

I am running the normal beta version, not full-access.

image

NickM-27 commented 11 months ago

CC @felipecrs

felipecrs commented 11 months ago

@wimb0 what is your HAOS and Supervisor version?

felipecrs commented 11 months ago

Ops, got it. You are using Debian with Supervised.

I am not very sure how supervised installations work: is it possible to upgrade docker?

felipecrs commented 11 months ago

I believe you need Docker v23 at least. Please confirm the docker version you have as well.

wimb0 commented 11 months ago

Docker is already at the latest version:

felipecrs commented 11 months ago

That's weird. Docker's CAP_PERFMON must not be working in your environment for some reason. It works for me though, using HAOS.

felipecrs commented 11 months ago

I don't mind reverting the PR above, but if I were you I would probably dig a little deeper to find out why it doesn't work.

Maybe there is a minimum systemd version required for this to work.

felipecrs commented 11 months ago

What is your Linux kernel version? Can you also check the systemd version?

felipecrs commented 11 months ago

Some more information is provided in https://github.com/intel/media-delivery/blob/3cef91ef32a5dcebb570ae5a3a4f82d339c6b105/doc/howto.rst#id2. There, docs still does not account to the fact that CAP_PERFMON is now supported.

felipecrs commented 11 months ago

PERFMON is listed as a supported capability in Docker docs too: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities:~:text=listen%20to%20multicasts.-,PERFMON,-Allow%20system%20performance

xconverge commented 11 months ago

FWIW, I dont run HAOS or the addon but I was unable to get CAP_PERFMON working with the intel gpu stats/docker and reverted back to running my container in privileged mode

Debian 12 Kernel 6.1.0-13-amd64 Docker v24 systemd 252 (252.17-1~deb12u1)

Snippet from my docker compose file, the cap_add is "valid" but I still don't get stats, I had to add privileged to get them working

    frigate:
        container_name: frigate
        image: ghcr.io/blakeblackshear/frigate:dev-14c89c9
        privileged: true
        cap_add:
            - CAP_PERFMON

Also if I do this it also works (without the privileged flag):

        cap_add:
          - SYS_ADMIN 
felipecrs commented 11 months ago

@xconverge can you confirm the name of the capability with docker inspect? This is how it looks for me (HAOS), which is working:

image

xconverge commented 11 months ago

First (with the snippet I sent unchanged) I saw

            "CapAdd": [
                "CAP_PERFMON"
            ],

so then I took the hint and changed my docker compose to "PERFMON" and now I get:

            "CapAdd": [
                "PERFMON"
            ],

and it still doesn't work

I also checked this

# capsh --print | grep cap_perfmon
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
wimb0 commented 11 months ago

What is your Linux kernel version? Can you also check the systemd version?

uname -a: Linux optiplex 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux

systemd --version:

systemd 247 (247.3-7+deb11u4)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified

@xconverge can you confirm the name of the capability with docker inspect? This is how it looks for me (HAOS), which is working:

I did inspect as well:

"CapAdd": [
"PERFMON"
],
"CapDrop": null,
felipecrs commented 11 months ago

Debian 12 Kernel 6.1.0-13-amd64 Docker v24 systemd 252 (252.17-1~deb12u1)

HAOS is also systemd 252, so that must not be the problem.

felipecrs commented 11 months ago

@NickM-27 I am out of ideas. Here are the options, I believe:

Let me know if you'd like any action on my side.

NickM-27 commented 11 months ago

let's revert for now

xconverge commented 11 months ago

https://github.com/home-assistant/operating-system/discussions/2319#discussioncomment-5666111 https://github.com/blakeblackshear/frigate/pull/6166

This might explain it for me

# cat  /proc/sys/kernel/perf_event_paranoid
3

Changing it to 2 did indeed work for me. I suspect this is what caught up @wimb0 system too!

wimb0 commented 11 months ago

home-assistant/operating-system#2319 (comment) #6166

This might explain it for me

# cat  /proc/sys/kernel/perf_event_paranoid
3

Changing it to 2 did indeed work for me. I suspect this is what caught up @wimb0 system too!

Indeed, that works for me too. image

What I did: sudo sysctl kernel.perf_event_paranoid=2 (default was 3) restart Frigate

felipecrs commented 11 months ago

@NickM-27 how about I add a note to the docs for this case (when running supervised but not HAOS), rather than reverting the PR?

NickM-27 commented 11 months ago

I think that's fine, I'd suggest adding it to the hwaccel docs that already covers intel GPU stats

wimb0 commented 11 months ago

Maybe, if it is a default setting in HAOS, it should be added to the HA Supervised Install docs. I'll create an issue there, and see what they think about it.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.