ZoneMinder / zoneminder

ZoneMinder is a free, open source Closed-circuit television software application developed for Linux which supports IP, USB and Analog cameras.
http://www.zoneminder.com/
GNU General Public License v2.0
5.19k stars 1.23k forks source link

GPU processing for ffmpeg and jpeg manipulations [$50] #209

Closed nzbtuxnews closed 3 years ago

nzbtuxnews commented 11 years ago

This is really more a feature request/enhancement than a bug or issue:

eventually, Zoneminder should allow compilation with certain flags to permit utilization of GPU's to perform ffmpeg or jpeg transformations hence providing a much better performance than CPU while freing this one for other CPU related tasks (perl & php scripts, SQL database, etc)

I am not sure if this should be done at the component level (ffmpeg, libjpeg-turbo, etc) or at the top level (zm daemons)

There is a $50 open bounty on this issue. Add to the bounty at Bountysource.

chriswiggins commented 11 years ago

Depends on how much you want to optimise I suppose. Most of the resizing and colour transforms is done by swsscale and to be honest @mastertheknife is the best one to ask about the motion detection. I agree that we need to do GPU enhancements but what sort of gains will we expect to get from our efforts?

Sent from my iPhone

On 10/10/2013, at 7:42 am, "lpallard1" notifications@github.com<mailto:notifications@github.com> wrote:

This is really more a feature request/enhancement than a bug or issue:

eventually, Zoneminder should allow compilation with certain flags to permit utilization of GPU's to perform ffmpeg or jpeg transformations hence providing a much better performance than CPU while freing this one for other CPU related tasks (perl & php scripts, SQL database, etc)

I am not sure if this should be done at the component level (ffmpeg, libjpeg-turbo, etc) or at the top level (zm daemons)

— Reply to this email directly or view it on GitHubhttps://github.com/ZoneMinder/ZoneMinder/issues/209.

nzbtuxnews commented 11 years ago

I agree with you, this would be a rather low priority feature but nevertheless implementing GPU support wouldnt allow throwing more FPS at zoneminder without choking it would it?

Gain will be a transfer of resources to the core components freeing the CPU from doing tasks a GPU is physically optimized to do. I can see a tremendous gain in overall performance for high end cameras ...

IMO the days of 320x240 at 5FPS are over... :)

mastertheknife commented 11 years ago

I did a few tests with ZM motion detection algorithm and OpenCL 1.1 about an year ago. I agree, it was about 10x faster doing it on the GPU than CPU, but copying the data to the GPU's RAM and back takes a great amount of time, so it ended up being slower. For best efficiency, the frame should be kept in the GPU's RAM during the entire process. OpenCL 2.0 is said to improve transfers between host and GPU. Time will tell..

connortechnology commented 11 years ago

I wonder if it would work better if it was processing larger chunks of h264 or whatever, plus maybe we could implement far more interesting/intensive algorithms. Fun stuff to work on long term.

Plus wasn't nextime doing license plate detection? Facial recognition? Very fun stuff.

nzbtuxnews commented 11 years ago

If I could get my setup to even work, I'd be more than interested to participate in experimenting this... I totally agree, a lot of algorithms (powerful) could be implemented especially if GPU is more or less dedicated to ZoneMinder.

I wouldnt mind adding a small GPU to my server... My guess is that it would be better than the actual setup

wtfrank commented 10 years ago

I have had some thoughts on this topic a couple of years ago after observing that 2 cameras emitting h264 in HD at 30fps was more than a 3GHz Core 2 Duo CPU could comfortably handle.

I had a brief look into the VDPAU API and if I recall correctly it was possible to hardware accelerate h264 decoding into a buffer. It would also be useful to perform motion detection with CUDA or OpenCL. This left two questions for me: 1) is h264 decompression or motion detection a greater load on the CPU? 2) Is it possible to combine VDPAU and OpenCL?

There was also a 3rd question...will I change jobs any time in the near future and get a chance to look at this imbetween :)

deweydb commented 7 years ago

Can we ad $ to this bounty? Something we would like to see as well.

connortechnology commented 7 years ago

Feel free. Click on any of the bountysource links in this thread to go to bountysource where you can do just that.

ajtalbot1 commented 7 years ago

Just adding my notes, and tests. I hope this can help some one connect all the dots and give us hardware acceleration. https://forums.zoneminder.com/viewtopic.php?f=36&t=25899

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mfu-mcosys commented 5 years ago

Still no vaapi (by ffmpeg) supported? Shame :)

How big does the bounty have to be? seriously.

ajtalbot1 commented 5 years ago

Ran out of CPU resources. Had to move over to BlueIris, which has great hardware acceleration. I have been VERY pleased with the move.

On Wed, Jan 30, 2019 at 11:00 AM mfu-mcosys notifications@github.com wrote:

Still no vaapi (by ffmpeg) supported? Shame :)

How big does the bounty have to be? seriously.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ZoneMinder/zoneminder/issues/209#issuecomment-459066238, or mute the thread https://github.com/notifications/unsubscribe-auth/Ab3CGq4L6MpcvFQENx9tYcBCzz3Zd3XGks5vIeu-gaJpZM4BFNMY .

connortechnology commented 5 years ago

Master now contains support for hwaccels in decoding. I have tested vaapi and cuda on intel, nvidia and amd chipsets. All work well.

mfu-mcosys commented 5 years ago

thx, will test it later :)

mfu-mcosys commented 5 years ago

Hm, seems i miss something?

zmc_m3[8846].DB1-zm_ffmpeg_camera.cpp/524 [HWACCEL not in use] tells me: hwaccel is compiled in but not active Change in the Source from the Cam for "DecoderHWAccelName" (vaapi) and/or "DecoderHWAccelDevice" (/dev/dri/renderD128) dont work. So, something is wrong here?
WereCatf commented 5 years ago

I just compiled Zoneminder and, when enabling cuda on an NVIDIA-card, I'm only getting "Unable to create conversion context for rtsp://insertcameraurlhere from cuda to rgba"

H/W-acceleration works fine from command-line with ffmpeg, so I'm not sure what's going on.

Would @connortechnology shed some info on this?

connortechnology commented 5 years ago

The cuda format is what you get from the hw decoder, we then transfer it from the gpu to another frame that is the nv12 format. So that line tells us that the hwtransfer step didn't happen. This would happen if something went wrong when selecting the format... there should be a line at debug level 1 starting with Selected gw_pix_fmt that tells us what is expected, which should be cuda.

WereCatf commented 5 years ago

@connortechnology Well, I did pastebin the portion of the log that I imagine is the most relevant at https://pastebin.com/duY7fVPW

In zoneminder I have set DecoderHWAccelName as "cuda" and DecoderHwAccelDevice as "0" (without quotes). The ffmpeg I am using is the same as in Ubuntu 19.04's repos, except I rebuilt it with the additional flags of "--enable-libnpp --enable-opencl --enable-nonfree --enable-libnpp --enable-libmfx --enable-nvenc --enable-cuda" which, to my understanding, should be enough. At least it works with everything else I've thrown at it just fine, including hardware-accelerated encoding and decoding.

I do not know what else I should be mentioning that might be relevant here.

connortechnology commented 5 years ago

Um, don't put 0 in the Device. Just leave it blank. This is all new, so I'm sure there is a lot to learn. Will have to write a howto

WereCatf commented 5 years ago

@connortechnology I did try that, didn't change anything.

connortechnology commented 5 years ago

Well how about a debug level 3 zmc log?

WereCatf commented 5 years ago

@connortechnology Take a look at https://pastebin.com/sUsCK9jb

I'm not particularly familiar with libavcodec or zoneminder's codebase, but at a glance the only thing that looks relevant would be 2019-08-20 01:46:00.693885 zmc_m1[6274].WAR-zm_ffmpeg.cpp/67 [cuda is not supported as input pixel format

connortechnology commented 5 years ago

Please update to the latest. I don't see lines in that pastebin that I would expect to see.

WereCatf commented 5 years ago

@connortechnology I just compiled latest source, not much changed. "Selected gw_pix_fmt 119 cuda" now reads "Selected hw_pix_fmt 119 cuda", but that's pretty much it -- it still says that cuda is not supported as input pixel format and that's where the whole thing seems to stop.

https://pastebin.com/3pwT9pZT

connortechnology commented 5 years ago

Ok that was dumb of me. I hadn't actually tested the non-passthrough case.

Fixed

WereCatf commented 5 years ago

I can confirm that it's working now, both with passthrough and not.

The performance-improvement is rather appalling, though -- going from ~41% CPU-usage to ~35% isn't really much to brag about. Looks like something else is constantly eating up CPU, even with an empty timestamp-string (didn't notice any change, does ZM insist on inserting an overlay even when there's nothing in it?), blending disabled, recording disabled and the camera in either nodect or monitor - mode. The decoding of incoming video seems a rather small part of the load, though even a small improvement is more than nothing.

Random observations (not necessarily even related to HW-accel): Using 32bit RGB with cuda-hwaccel is faster than 24bit by quite a large margin. Using 8bit with cuda-hwaccel greyscale is still faster than 32bit RGB. Might be a good idea to mention these cases in the documentation for any cuda-users. Logic would lead me to believe the same applies to other HW-accel methods, but I can't test. scale_cuda or scale_npp could possibly be used to speed up scaling of video/images when using cuda, scale_qsv or scale_vaapi on Intel and AMD. HW-encoding could possibly be useful in some places, though NVENC is only limited to two simultaneous ones. I believe QuickSync isn't and any limits come from simply how much workload there's on it, but my Xeon ain't got no iGPU, so I can't play around with any qsv-stuff.

scooter75 commented 5 years ago

@WereCatf

Thanks for keeping this thread going. I have also been looking to get Nvidia GPU support for ZM

I tried compiling by using ./do_debian_package.sh --snapshot=NOW --branch=master --type=local but end up with this error at 79%

I have also compiled FFmpeg with the following options --enable-nvenc --enable-nonfree --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64

Do you have any ideas that could help ?

[ 79%] Linking CXX executable zma cd "/home/scotth/zoneminder_1.33.14~20190911231433.orig/dbuild/src" && /usr/bin/cmake -E cmake_link_script CMakeFiles/zma.dir/link.txt --verbose=1 /usr/bin/c++ -g -O2 -fdebug-prefix-map=/home/scotth/zoneminder_1.33.14~20190911231433.orig=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -std=c++11 -Wall -D__STDC_CONSTANT_MACROS -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -rdynamic CMakeFiles/zma.dir/zma.cpp.o -o zma -L"/home/scotth/zoneminder_1.33.14~20190911231433.orig/src/libbcrypt" -Wl,-rpath,"/home/scotth/zoneminder_1.33.14~20190911231433.orig/src/libbcrypt:" libzm.a -lrt -lz -lcurl -ljpeg -lssl -lcrypto -lpthread -lpcre -lgcrypt -lgnutls-openssl -lmysqlclient -lx264 -lmp4v2 /usr/local/lib/libavformat.a /usr/local/lib/libavcodec.a /usr/local/lib/libavdevice.a /usr/local/lib/libavutil.a /usr/local/lib/libswscale.a /usr/local/lib/libswresample.a -lvlc -ldl /usr/local/lib/libavutil.a(mathematics.o): In functionav_rescale_delta': /home/scotth/ffmpeg/libavutil/mathematics.c:168: multiple definition of av_rescale_delta' libzm.a(zm_ffmpeg.cpp.o):./dbuild/src/./src/zm_ffmpeg.cpp:197: first defined here collect2: error: ld returned 1 exit status src/CMakeFiles/zma.dir/build.make:117: recipe for target 'src/zma' failed make[3]: *** [src/zma] Error 1 make[3]: Leaving directory '/home/scotth/zoneminder_1.33.14~20190911231433.orig/dbuild' CMakeFiles/Makefile2:350: recipe for target 'src/CMakeFiles/zma.dir/all' failed make[2]: *** [src/CMakeFiles/zma.dir/all] Error 2 make[2]: Leaving directory '/home/scotth/zoneminder_1.33.14~20190911231433.orig/dbuild' Makefile:154: recipe for target 'all' failed make[1]: *** [all] Error 2 make[1]: Leaving directory '/home/scotth/zoneminder_1.33.14~20190911231433.orig/dbuild' dh_auto_build: cd dbuild && make -j1 returned exit code 2 debian/rules:15: recipe for target 'build' failed make: *** [build] Error 2 dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2 debuild: fatal error at line 1152: dpkg-buildpackage -rfakeroot -us -uc -ui -i -b failed Error status code is: 0 Build failed.

Thanks again for your efforts so far.

kenthinson commented 5 years ago

I believe this issue might benefit from being split into multiple tasks(issues). What do you all think?

  1. GPU Decode acceleration
  2. GPU encode acceleration
  3. GPU Motion Detection acceleration. (Probably tapping a open source Tensorflow library for person detection).
  4. Build scripts that allow for a end user to make the project with these turned on or off as flags.
  5. Documentation.
  6. ???

Feedback about the idea welcome.

connortechnology commented 5 years ago

1 is done.

2 I have code for, but refuse to merge until 1.35.

Rayn0r commented 4 years ago

For what it's worth... I'm running Zoneminder 1.33.15 built from source under Ubuntu 18.04 with a cheap GeForce GT 740 and two FullHD cameras. After setting DecoderHWAccelName to "cuda-hwaccel" for both cameras, I could clearly see that the server load went down, while the CPU usage pretty much stayed the same: Screenshot_2019-12-18 Server - Grafana-load Screenshot_2019-12-18 Server - Grafana-CPU

connortechnology commented 4 years ago

Bountysource is going to take the money in this bounty on July 1st unless the funds are redirected. Alternatively I can close this out, collect the bounty and ZoneMinder will hold the funds until it is properly closed.

mfu-mcosys commented 4 years ago

Bountysource is going to take the money in this bounty on July 1st unless the funds are redirected. Alternatively I can close this out, collect the bounty and ZoneMinder will hold the funds until it is properly closed.

From my point: do what u want, but plz solve this problem if u get in mood :)

connortechnology commented 3 years ago

At long last I have put the infrastructure in place. 1.35.28 contains support for vaapi and nvenc for encoding. opencl will come soon.