blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
18.19k stars 1.66k forks source link

Allow hardware acceleration of video scaling for the detect/rtmp streams #2383

Closed brujoand closed 1 year ago

brujoand commented 2 years ago

Describe what you are trying to accomplish and why in non technical terms I want to have one high resolution stream as my only input and have it scaled for the detect stream (and optionally rtmp stream) using hardware acceleration

Describe the solution you'd like Currently the detect stream will get -r <fps> -s <width>x<height>prepended to it's output arguments, this makes it impossible to do the resizing using hardware acceleration as these options are incompatible with the options for hardware accellerated scaling. If we could have a variables like the existing ones for ffmpeg, I could replace the scaling with something like -vf 'fps=5,scale_vaapi=w=1280:h=720' -c:v h264_vaapi. This would significantly reduce the load when using a single high resolution stream. As a sidenote, the reason for wanting to use a single stream is to make setup easier, and to avoid being blocked by crappy substreams that are not good enough for object detection.

Describe alternatives you've considered I've already setup a transcoding service infront of Frigate but this feels awfully reduntant and clunky.

brujoand commented 2 years ago

One options I guess would be to simply move those parameters to the default scaling arguments, but I guess that would make it hard to template in the correct fps and resolution values.

Maybe it could simply be disabled by a flag, then users who want to can add their appropriate scaling arguments to the output args which seems like the best place for it?

brujoand commented 2 years ago

For this to be useful though, there is one more obstacle. When the video is encoded to h264 via vaapi it will be in the pix_fmt vaapi_vld, which we can't use afaik. Somehow it needs to be converted into yuv420p. Any idea how to go about that?

blakeblackshear commented 2 years ago

It doesn't use the pix_fmt parameter to convert it?

brujoand commented 2 years ago

If i use “-pix_fmt yuv_420p” or set “format=yuv420p” in the “-vf” flags, the vaapi driver says that the format isn’t supported and it will use “vaapi_vld” instead. This format isn’t suitable for detection and the live stream it seems though.

I’m sure there’s a way to do it, but I haven’t quite understood how yet. I suspect that the approach would involve setting vaapi output format to vaapi and passing “hwdownload” to the end of “-vf” flags or setting the vaapi output format to yuv420p and using hwupload in the beginning of “-vf” flags.

brujoand commented 2 years ago

I might be confusing myself here though. From reading more on this it seems vaapi_vld is the hw surface format and the output video uses yuv420p anyway. I just don't understand exactly how this happens.

brujoand commented 2 years ago

Okay, so I finally got somewhere, I simply removed the hardcoding of the resolution and fps:

diff --git a/frigate/config.py b/frigate/config.py
index 0de01a7..f127d22 100644
--- a/frigate/config.py
+++ b/frigate/config.py
@@ -538,13 +538,7 @@ class CameraConfig(FrigateBaseModel):
                 else self.ffmpeg.output_args.detect.split(" ")
             )
             ffmpeg_output_args = (
-                [
-                    "-r",
-                    str(self.detect.fps),
-                    "-s",
-                    f"{self.detect.width}x{self.detect.height}",
-                ]
-                + detect_args
+                detect_args
                 + ffmpeg_output_args
                 + ["pipe:"]
             )
I set the cameras up like so:
          ffmpeg:
            hwaccel_args:
              - -hwaccel
              - vaapi
              - -hwaccel_device
              - /dev/dri/renderD128
              - -hwaccel_output_format
              - vaapi
            input_args: -avoid_negative_ts make_zero -fflags +genpts+discardcorrupt -rtsp_transport tcp -stimeout 5000000 -use_wallclock_as_timestamps 1 -r 20/1 -vsync vfr
            output_args:
              detect: "-vf fps=5,scale_vaapi=w=1280:h=720 -c:v h264_vaapi -an -pix_fmt yuv420p -f rawvideo"
            inputs:
              - path: rtsp://admin:password@10.20.1.14:554
                roles:
                  - record
                  - detect

Basically adding -r 20/1 and -vsync vfr, in addition to the detect output args.

The ffmpeg command executed by Frigate looks like so: ffmpeg -hide_banner -loglevel warning -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -avoid_negative_ts make_zero -fflags +genpts+discardcorrupt -rtsp_transport tcp -stimeout 5000000 -use_wallclock_as_timestamps 1 -r 20/1 -vsync vfr -i rtsp://admin:password@10.20.1.14:554 -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c copy /tmp/cache/stua-%Y%m%d%H%M%S.mp4 -vf fps=5,scale_vaapi=w=1280:h=720 -c:v h264_vaapi -an -pix_fmt yuv420p -f rawvideo pipe:

There are no errors in the frigate logs, but the camera now looks like this in Frigate:

Screenshot 2021-12-05 at 17 55 45

Since ffmpeg comlains about the pix format, I tried removing that flag. Then the image looks like this:

Screenshot 2021-12-05 at 17 33 57

If run the ffmpeg command manually, with or without the -pix_fmt flag and send it to a test.mp4 file instead of pipe ffprobe tells me: Stream #0:0: Video: h264, yuv420p(progressive), 1280x720, 5 tbr, 1200k tbn, 10 tbc

So basically, I have no idea what I'm doing xD

blakeblackshear commented 2 years ago

I'm actually surprised that didn't work. If I had to guess, it doesn't look like the output is actually yuv420p even though it says it is...

brujoand commented 2 years ago

Yeah, I feel like there is a part of the process here I'm not understanding. So when I do the transcoding outside of the container I basically do the exact same thing, and the resulting stream works fine in Frigate and shows up in VLC as using yuv420p. Decoding in HW in frigate and resizing in SF, works perfectly. Except for the CPU hit. But suddenly when doing both decode and encod in HW in the frigate process, the output to the pipe is vaapi_vld from what I can gather. And it seems that this is the only option from HW encoding. Could this be related to being raw video and not in a container? The muxer only packs the streams right it doesn't change anything like the pix_fmt?

brujoand commented 2 years ago

For science I tried re-implementing this attempt using qsv instead of vaapi, as these libraries both seem to have weird bugs. So I ended up with the following ffmpeg command for detect and record:

ffmpeg -hide_banner -loglevel warning -hwaccel qsv -c:v hevc_qsv -use_wallclock_as_timestamps 1 -fflags nobuffer -rtsp_transport tcp -stimeout 5000000 -i rtsp://admin:password@10.20.1.14:554 -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c copy /tmp/cache/stua-%Y%m%d%H%M%S.mp4 -vf fps=fps=5,vpp_qsv=w=1280:h=720 -c:v h264_qsv -g 25 -profile:v main -b:v 1M -an -f rawvideo -pix_fmt yuv420p pipe:

So this produces the exact same image as the first screenshot above. Since both vaapi and qsv use their respective formats in hw memory and both use nv12 in software memory I can only conclude that the data we are sending to the pipe is in fact nv12 and not yuv420p. I'm not familiar enough with color encoding to understand what this means and what the potential fix would be though.

So basically, would it be possible to add a configuration option for this so that the next step in the pipe knows that this is in fact nv12 and not yuv420p? Because it seems like it is not feasable to encode h264 using qsv or vaapi with yuv420p.

blakeblackshear commented 2 years ago

The problem is that everything in Frigate is built around the yuv420p pixel format currently. NV12 is similar, but it stores raw pixel values differently. It would require implementing the same set of operations (cropping, resizing, etc) in NV12. It's possible, but non-trivial.

blakeblackshear commented 2 years ago

Several hardware decoders output NV12, so I do think it's worthwhile.

ozett commented 2 years ago

maybe also a benefit for running it (at hte end) on the jetson nano hardware ?... +1

idomp commented 2 years ago

+1 for this idea!

can you please explain how did you achieve this? " I've already setup a transcoding service infront of Frigate but this feels awfully reduntant and clunky."

I would really like the take the 4K stream and transcode it down to 720p for detection on my Dahua 4K cams. they have only One main stream and One sub stream, that sends the image as 4:3~ resolution.

brujoand commented 2 years ago

@idomp sure, I wrote down the gist of it here

Lots of trial and error with the ffmpeg parameters but it's been holding up really well.

brujoand commented 2 years ago

So I finally figured this out thanks to a comment on reddit.

Basically I had to change this:

-vf fps=fps=5,vpp_qsv=w=1280:h=720 -c:v h264_qsv -g 25 -profile:v main -b:v 1M -an -f rawvideo -pix_fmt yuv420p pipe:

into this:

-vf fps=fps=5,vpp_qsv=w=1280:h=720:format=nv12,hwdownload,format=nv12,format=yuv420p -an -f rawvideo pipe:

The problem was (from my understanding) that when we use hardware accelerate operations the frames are kept in a hardware dependent format. In this case nv12, and converting while residing in gpu memory isn't possible. But by using the 'hwdownload' instruction the frames are moved into normal memory and we can convert to yuv420p. I've tried this previously but it seems I haven't been able to get the format instructions correct. I've also included the encoding of h264 as a inbetween step, which wasn't necessary either.

Now though it works perfectly and my Intel Xeon E-2224G (which is a fairly modest cpu) isn't even breaking a sweat running 5 4k streams this way. Each stream eats about 6% cpu, and the gpu is strolling at around 40% utilization at the lowest clock speed. Great success :D

blakeblackshear commented 2 years ago

Thanks for coming back and updating.

brujoand commented 2 years ago

@billimek your change doesn't work because you're mixing qsv in the output and vaapi input. They use different formats internally so it's a bit tricky to mix and match. I'm guessing that changing your hwaccell_args to this would have worked: hwaccel_args: -hwaccel qsv -c:v hevc_qsv Also, not sure if it matters but I've removed the quotes too. So this is what I'm currently doing:

              inngang:
                ffmpeg:
                  output_args:
                    record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c:v h264_qsv -global_quality 30 -c:a copy
                    detect: -vf fps=fps=5,vpp_qsv=w=1280:h=720:format=nv12,hwdownload,format=nv12,format=yuv420p -an -f rawvideo
                  input_args: -rtsp_transport tcp -avoid_negative_ts make_zero -fflags nobuffer -flags low_delay -strict experimental
                  hwaccel_args: -hwaccel qsv -c:v hevc_qsv
                  inputs:
                    - path: rtsp://rtsp-proxy.automation.svc.cluster.local:8554/inngang
                      roles:
                        - detect
                        - record

So basically decoding h265, writing to disk as h264 and forwarding a downscaled yuv420p stream to detection.

I'm picking this thread up again because now I'm facing the third and hopefully final challenge. How do I get an RTMP stream out of this without encoding to h264 again. I'm hoping to be able to use the same encoded stream for storing to disk and for RTMP, but haven't found an elegant approach yet.

NickM-27 commented 1 year ago

this has been added in 0.12 with ffmpeg presets

kirsch33 commented 1 year ago

@NickM-27

which preset would apply for having an Nvidia GPU resize the main stream from 4k to 1080p for example?

I am referencing the beta documentation but a bit confused on what output preset to use for this case. or would simply using a restream and then setting detect width/height automatically take care of it (assuming i set preset-nvidia-h264)?

NickM-27 commented 1 year ago

@kirsch33 yes, if you set hwaccel_args: preset-nvidia-h264 AND you set

cameras:
  4k_camera:
    detect:
      width: 1920
      height: 1080

then it will use the GPU to do the resizing