blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
19.06k stars 1.74k forks source link

[DOCS] Update Hardware-accelleration pages - nvidia GPU no great benefit - make use of NVIDIA-GPU for tensorflow #2179

Closed ozett closed 2 years ago

ozett commented 3 years ago

Describe what you are trying to accomplish and why in non technical terms i wanted to know if GPU-hardware/CPU accelleration in decrypting of h264/h265 helps

Describe the solution you'd like update the docs (for newbies?) to see that they may dont need special effort in fiddling with hwaccel-parameters

Describe alternatives you've considered make use of exitings NVIDA and CUDA for TENSORFLOW to overcam the CORAL? its powerfull for tensorflow. why stick to 8bit CORALs and not use fancier tensorflow models?

Additional context i have to nearly identical machines running frigate in docker. both have one coral edge m2 tpu. both use the same 13 rtps cams. one has a nvida gpu 1660 and frigate is configured to do ffmpeg-hardware-decoding of h264/h265 rtsp-streams with it. the other is configured to decrypt with CPU

NVIDIA (snippet)

input_args:
        - -c:v
        #- hevc_cuvid    # h265
        - h264_cuvid

CPU (snippet)

input_args:
        # vaainfo to check ver, and check platform -> https://trac.ffmpeg.org/wiki/Hardware/QuickSync
        - -hwaccel      # check with intel_gpu_top
        #- qsv          # HW CPU > 10. 
        - vaapi         # HW CPU < 10 , works only with h264?.. not for modern codecs, check capabiltiy with qsv
        - -hwaccel_device
        - /dev/dri/renderD128
        - -hwaccel_output_format
        - yuv420p
        #- -tag:v hvc1`

on both machines the CPU load is nearly identical overall. NVIDIA GPU seems to me nearly useless for RTPS offloading.

Why not use it for TENSORFLOW AI? That would be great! to overcam the CORAL?

image

image

xSkate commented 3 years ago

"Load Average" is not a percentage. This article might help make sense of it: https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html The machine on the left actually has a CPU load around 30% higher than the machine on the right, so it looks like hardware acceleration is doing its job quite well.

ozett commented 3 years ago

interesting reading. thanks. dont know if i get all of it, but glad that it is still on the net. and that you pointed to it.

if i got it right, than CPU% is to compare, not the load average. image

as addition i put on the .159-system doublt-take container and compreface with a CPU-Model running and there seems still some idle-time on the old CPU

image

but overall in my setting i could stay with my 10 year old CPU-only. it seems thats not worth the effort and money to offload ~ 30% for 13 cams to the nvidia-GPU.

still worth the effort and money to try lots of other tensorflow-models on the GPU 😄

ozett commented 3 years ago

i forgot about intel-gpu stats -fri5:/etc/munin/plugins# /usr/bin/intel_gpu_top and maybe also forget to compare cpu/gpu-load of nvidia-jetson with a coral

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.