Joshua-Riek / ubuntu-rockchip

Ubuntu for Rockchip RK35XX Devices
https://joshua-riek.github.io/ubuntu-rockchip-download/
GNU General Public License v3.0
2.25k stars 242 forks source link

Frigate H.264 acceleration broken in linux-6.1-stan-rkr3 #906

Closed jonadis closed 2 months ago

jonadis commented 3 months ago

I'm using an orangepi5b with 16GB of RAM to run Frigate 0.14b3. Everything was working fine on 6.1.0-1016 but after upgrading to 6.1.0-1018 frigate would no longer start up correctly saying

[ERROR:0@75.042] global cap_ffmpeg_impl.hpp:1309 open Could not open codec h264, error: -11
[ERROR:0@75.042] global cap_ffmpeg_impl.hpp:1317 open VIDEOIO/FFMPEG: Failed to initialize VideoCapture
[ERROR:0@75.042] global cap.cpp:164 open VIDEOIO(CV_IMAGES): raised OpenCV exception: OpenCV(4.9.0) /io/opencv/modules/videoio/src/cap_images.cpp:274: error: (-215:Assertion failed) number < max_number in function 'icvExtractPattern'

in the logs. I was able to boot the older 1016 to confirm that it worked correctly again, so something must have been broken in either 1017 or 1018. I don't know much about kernel development but I do still have both versions 1016 and 1018 installed if there's anything I can gather to help in troubleshooting.

Joshua-Riek commented 3 months ago

I don't know much about frigate. Do you have a direct example of h.264 failing with ffmpeg?

jonadis commented 3 months ago

I can include the entire docker startup logs but that's about all I know. 4a4313e55569c4003afeb8ec8996d60b3a5135b990a26e47aae36f360f1ba064-json.log Maybe someone smarter will run into this issue and be of more help. This is the recommended flavor of Ubuntu to run by the Frigate devs so I assume someone else will run into this sooner or later.

Joshua-Riek commented 3 months ago

It may be smart to open a issue with Frigate just in case, I recently updated the kernel to the latest SDK from Rockchip linux-6.1-stan-rkr3 (source) and will start to look through logs shortly.

Also tagging @nyanmisaka and @hbiyik; see any problems with ffmpeg and the latest kernel SDK? From my initial evaluation I did not encounter any notable issues, so this is interesting.

nyanmisaka commented 3 months ago

From the video samples I have, no regression has been found in rkr3. I have been on 6.1.57 and just upgraded to 6.1.76 yesterday. There are many changes in the MPP kernel driver between rkr1~rkr3, and a video sample is needed, otherwise it is difficult to bisect.

https://github.com/armbian/linux-rockchip/commits/rk-6.1-rkr3/drivers/video/rockchip/mpp?since=2023-12-29&until=2024-07-03

ubuntu@ubuntu:~$ uname -a
Linux ubuntu 6.1.0-1019-rockchip #19-Ubuntu SMP Mon Jul  1 12:27:26 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@ubuntu:~$ ./ffmpeg -hwaccel rkmpp -hwaccel_output_format drm_prime -afbc rga -i ~/jellyfish-120-mbps-4k-uhd-h264.mkv -an -sn -f null -
ffmpeg version 342fe8368c-20240628 Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 14.1.0 (crosstool-NG 1.26.0.93_a87bf7f)
  configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=aarch64-ffbuild-linux-gnu- --arch=aarch64 --target-os=linux --enable-gpl --enable-version3 --disable-debug --enable-iconv --enable-zlib --enable-libfreetype --enable-libfribidi --enable-gmp --enable-libxml2 --enable-openssl --enable-fontconfig --enable-libharfbuzz --enable-libvorbis --enable-opencl --enable-libpulse --enable-libvmaf --enable-libxcb --enable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-chromaprint --enable-libdav1d --disable-libdavs2 --enable-libdvdread --enable-libdvdnav --disable-libfdk-aac --enable-ffnvcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libaribcaption --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --disable-libvpx --enable-libwebp --enable-lv2 --disable-libvpl --enable-openal --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-rkmpp --enable-rkrga --enable-librubberband --disable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --enable-libdrm --disable-vaapi --enable-libvidstab --enable-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --disable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-libs='-ldl -lstdc++ -lstdc++ -lgomp' --extra-ldflags=-pthread --extra-ldexeflags=-pie --cc=aarch64-ffbuild-linux-gnu-gcc --cxx=aarch64-ffbuild-linux-gnu-g++ --ar=aarch64-ffbuild-linux-gnu-gcc-ar --ranlib=aarch64-ffbuild-linux-gnu-gcc-ranlib --nm=aarch64-ffbuild-linux-gnu-gcc-nm --extra-version=20240628
  libavutil      59.  8.100 / 59.  8.100
  libavcodec     61.  3.100 / 61.  3.100
  libavformat    61.  1.100 / 61.  1.100
  libavdevice    61.  1.100 / 61.  1.100
  libavfilter    10.  1.100 / 10.  1.100
  libswscale      8.  1.100 /  8.  1.100
  libswresample   5.  1.100 /  5.  1.100
  libpostproc    58.  1.100 / 58.  1.100
Input #0, matroska,webm, from '/home/ubuntu/jellyfish-120-mbps-4k-uhd-h264.mkv':
  Metadata:
    encoder         : libebml v1.2.0 + libmatroska v1.1.0
    creation_time   : 2016-02-06T04:01:06.000000Z
  Duration: 00:00:30.03, start: 0.000000, bitrate: 120490 kb/s
  Stream #0:0(eng): Video: h264 (High), yuv420p(tv, bt709, progressive), 3840x2160 [SAR 1:1 DAR 16:9], 29.97 fps, 29.97 tbr, 1k tbn (default)
rga_api version 1.10.0_[8]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (h264_rkmpp) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf61.1.100
  Stream #0:0(eng): Video: wrapped_avframe, drm_prime(tv, bt709, progressive), 3840x2160 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 29.97 fps, 29.97 tbn (default)
      Metadata:
        encoder         : Lavc61.3.100 wrapped_avframe
[out#0/null @ 0xaaab15f6ef90] video:387KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: unknown
frame=  900 fps=248 q=-0.0 Lsize=N/A time=00:00:30.02 bitrate=N/A speed=8.26x
nyanmisaka commented 3 months ago

{"log":"2024-07-01 13:15:50.384543912 [ERROR:0@48.137] global cap_ffmpeg_impl.hpp:1309 open Could not open codec h264, error: -11\n","stream":"stdout","time":"2024-07-01T17:15:50.384769741Z"}

https://github.com/blakeblackshear/frigate/discussions/12228 From the log opencv is trying to open h264 software decoder. IIRC the Frigate doesn't use ffmpeg hardware acceleration this way. So I suspect it is not related to rkmpp. Please test rkmpp with ffmpeg and the command in this Wiki.

jonadis commented 3 months ago

The ffmpeg tests all seem to work on both kernel versions. In fact I have an ffmpeg task that runs daily that assembles a folder full of JPEGs into a daily timelapse and it works fine on both kernel versions. I have no idea if this has anything to do with this or not but one thing I found that's different is...

localadmin@orangepi5b:~$ uname -r
6.1.0-1016-rockchip
localadmin@orangepi5b:~$ sudo cat /sys/kernel/debug/rknpu/version
RKNPU driver: v0.9.6
localadmin@orangepi5b:~$ uname -r
6.1.0-1019-rockchip
localadmin@orangepi5b:~$ sudo cat /sys/kernel/debug/rknpu/version
RKNPU driver: v0.9.7

According to the frigate docs all that's needed is 0.9.2 or later but perhaps there was some breaking change between 0.9.6 and 0.9.7 that is incompatible with frigate.

nyanmisaka commented 3 months ago

At least the ffmpeg test passed to prove that the issue is not related to video hardware acceleration.

@MarcA711 do you know more about the error in frigate?

MarcA711 commented 3 months ago

I just upgraded my system to 6.1.0-1019-rockchip and I can't reproduce this issue.

@jonadis I see you have lots of cameras and use go2rtc. You have more than 30 ffmpeg processes I think. Maybe this is causing issues. Could you backup your config.yml and maybe start with a fresh one? Just use one reliable camera and try to get detection and recording working. No go2rtc for now. If this works, you can use go2rtc and add more cams (one by one).

jonadis commented 3 months ago

No go2rtc and a single camera does indeed seem to work on the latest kernel.

cameras:
  front_driveway:
    enabled: True
    ffmpeg:
      inputs:
        - path: rtsp://admin:redacted@10.0.71.155:554/Streaming/Channels/102
          roles:
            - detect
        - path: rtsp://admin:redacted@10.0.71.155:554/Streaming/Channels/101
          roles:
            - record

I'll work on adding back go2rtc and adding cameras back in one at a time. Help me understand what this means? I do indeed have a fair number of ffmpeg processes, but that doesn't pose an issue with the older kernel, only the latest few builds. What does that indicate?

localadmin@orangepi5b:~$ ps -ef | grep -i ffmpeg | wc -l
39
localadmin@orangepi5b:~$ uname -r
6.1.0-1016-rockchip
localadmin@orangepi5b:~$ 
Joshua-Riek commented 3 months ago

If a single camera works, can you try adding cameras until you encounter the error? Maybe there is a memory or buffer limit / bug in the new kernel.

nrpetonr commented 3 months ago

no problem here, running 3 cameras and go2RTC configured

jonadis commented 3 months ago

When using go2rtc, 4 (main+substream) +1 singlestream cameras (5 total cameras, 9 total streams, 19 total ffmpeg processes) seems to be the limit on 1019 when I add one more camera I start getting these errors:

2024-07-04 15:41:39.198315157  [ERROR:0@19.741] global cap_ffmpeg_impl.hpp:1309 open Could not open codec h264, error: -11
2024-07-04 15:41:39.198328574  [ERROR:0@19.741] global cap_ffmpeg_impl.hpp:1317 open VIDEOIO/FFMPEG: Failed to initialize VideoCapture
2024-07-04 15:41:39.199331037  [ERROR:0@19.742] global cap.cpp:164 open VIDEOIO(CV_IMAGES): raised OpenCV exception:
2024-07-04 15:41:39.199364287  
2024-07-04 15:41:39.199369537  OpenCV(4.9.0) /io/opencv/modules/videoio/src/cap_images.cpp:274: error: (-215:Assertion failed) number < max_number in function 'icvExtractPattern'
2024-07-04 15:41:39.199371287  
2024-07-04 15:41:39.199373329  
2024-07-04 15:41:39.284213708  [ERROR:0@19.827] global cap_ffmpeg_impl.hpp:1309 open Could not open codec h264, error: -11
2024-07-04 15:41:39.284392500  [ERROR:0@19.828] global cap_ffmpeg_impl.hpp:1317 open VIDEOIO/FFMPEG: Failed to initialize VideoCapture
2024-07-04 15:41:39.284950461  [ERROR:0@19.828] global cap.cpp:164 open VIDEOIO(CV_IMAGES): raised OpenCV exception:
2024-07-04 15:41:39.284958628  
2024-07-04 15:41:39.284963586  OpenCV(4.9.0) /io/opencv/modules/videoio/src/cap_images.cpp:274: error: (-215:Assertion failed) number < max_number in function 'icvExtractPattern'
2024-07-04 15:41:39.284965336  

11 total cameras (20 total streams, 39 total ffmpeg processes) runs fine on the older kernel

MarcA711 commented 3 months ago

Could you post the output of sudo dmesg after the error occurred?

jonadis commented 3 months ago

Here is all the dmesg output after starting frigate ... the error usually occurs within about 15 seconds of starting the container...

[  159.101269] veth3a5f78c: renamed from eth0
[  159.114846] br-957798a14698: port 5(veth33596eb) entered disabled state
[  159.144850] br-957798a14698: port 5(veth33596eb) entered disabled state
[  159.146684] device veth33596eb left promiscuous mode
[  159.146699] br-957798a14698: port 5(veth33596eb) entered disabled state
[  163.210309] br-957798a14698: port 5(vethb0d1d95) entered blocking state
[  163.210323] br-957798a14698: port 5(vethb0d1d95) entered disabled state
[  163.210551] device vethb0d1d95 entered promiscuous mode
[  163.796004] eth0: renamed from veth7d2093f
[  163.823444] IPv6: ADDRCONF(NETDEV_CHANGE): vethb0d1d95: link becomes ready
[  163.823603] br-957798a14698: port 5(vethb0d1d95) entered blocking state
[  163.823618] br-957798a14698: port 5(vethb0d1d95) entered forwarding state
[  173.929214] cgroup: fork rejected by pids controller in /system.slice/docker-6664804fe39845e2d871c21d0abb967b0716eeeb2365151fa6c1360637384b41.scope
jonadis commented 3 months ago

For what its worth, the last line above (the 'cgroup' one) is not present when running the older kernel and Frigate runs happily.

Joshua-Riek commented 3 months ago

That croup message looks to be the problem. Can you try https://serverfault.com/questions/1032747/cgroup-fork-rejected-by-pids-controller

jonadis commented 3 months ago

I edited /usr/lib/systemd/system/user-.slice.d/10-defaults.conf from: TasksMax=33% to TasksMax=infinity it did not make any difference. I still get the same errors in Frigate, and the same cgroup error in dmesg.

Joshua-Riek commented 3 months ago

Try to modify DefaultTasksMax= directive in /etc/systemd/system.conf to something like 38035 then reboot? Sounds like we are on the right track though.

jonadis commented 3 months ago

I left TasxMax set to infinity in /usr/lib/systemd/system/user-.slice.d/10-defaults.conf and also set DefaultTasxMax in /etc/systemd/system.conf to 38035 and that seems to have cured it. Can you help an ignorant person understand how this is tied to the kernel upgrade?

Joshua-Riek commented 3 months ago

I suspect some resource shenanigans going on in the new Rockchip kernel. Needs to be researched further.

If a reason can not be found I may include those tweaks you mentioned in the ubuntu-rockchip-settings package, however this would be a hack and I would like to avoid implementing system wide hacks when possible.

Joshua-Riek commented 2 months ago

@jonadis can you please provide exact steps so I can reproduce your problem? I think there is a larger issue going on and I need to do some testing.

MarcA711 commented 2 months ago

I experienced this issue as well. I will try to provide more info later.

Joshua-Riek commented 2 months ago

I'm leaning to implement a user-space hot fix with https://github.com/Joshua-Riek/ubuntu-rockchip/issues/906#issuecomment-2211842830 as a temporary workaround. But I do not want to make it permanent.

MarcA711 commented 2 months ago

@Joshua-Riek not sure if it helps for debugging purposes, but you can start some docker image using docker run --rm -it debian:12 bash

And run this Skript from a bash file to create lots of forks and get the error:

#!/bin/bash

# Number of forks
n=1000

fork_process() {
    sleep infinity &
}

for ((i=0; i<n; i++)); do
    fork_process
done

wait
Joshua-Riek commented 2 months ago

I found the problem here https://github.com/Joshua-Riek/ubuntu-rockchip/issues/919#issuecomment-2225695982, the max user processes is very low. I need to figure out what kernel commit caused this change, but nonetheless its progress.

Joshua-Riek commented 2 months ago

Fixed, please see: https://github.com/Joshua-Riek/ubuntu-rockchip/issues/919#issuecomment-2227037115