Closed nzbtuxnews closed 3 years ago
Depends on how much you want to optimise I suppose. Most of the resizing and colour transforms is done by swsscale and to be honest @mastertheknife is the best one to ask about the motion detection. I agree that we need to do GPU enhancements but what sort of gains will we expect to get from our efforts?
Sent from my iPhone
On 10/10/2013, at 7:42 am, "lpallard1" notifications@github.com<mailto:notifications@github.com> wrote:
This is really more a feature request/enhancement than a bug or issue:
eventually, Zoneminder should allow compilation with certain flags to permit utilization of GPU's to perform ffmpeg or jpeg transformations hence providing a much better performance than CPU while freing this one for other CPU related tasks (perl & php scripts, SQL database, etc)
I am not sure if this should be done at the component level (ffmpeg, libjpeg-turbo, etc) or at the top level (zm daemons)
— Reply to this email directly or view it on GitHubhttps://github.com/ZoneMinder/ZoneMinder/issues/209.
I agree with you, this would be a rather low priority feature but nevertheless implementing GPU support wouldnt allow throwing more FPS at zoneminder without choking it would it?
Gain will be a transfer of resources to the core components freeing the CPU from doing tasks a GPU is physically optimized to do. I can see a tremendous gain in overall performance for high end cameras ...
IMO the days of 320x240 at 5FPS are over... :)
I did a few tests with ZM motion detection algorithm and OpenCL 1.1 about an year ago. I agree, it was about 10x faster doing it on the GPU than CPU, but copying the data to the GPU's RAM and back takes a great amount of time, so it ended up being slower. For best efficiency, the frame should be kept in the GPU's RAM during the entire process. OpenCL 2.0 is said to improve transfers between host and GPU. Time will tell..
I wonder if it would work better if it was processing larger chunks of h264 or whatever, plus maybe we could implement far more interesting/intensive algorithms. Fun stuff to work on long term.
Plus wasn't nextime doing license plate detection? Facial recognition? Very fun stuff.
If I could get my setup to even work, I'd be more than interested to participate in experimenting this... I totally agree, a lot of algorithms (powerful) could be implemented especially if GPU is more or less dedicated to ZoneMinder.
I wouldnt mind adding a small GPU to my server... My guess is that it would be better than the actual setup
I have had some thoughts on this topic a couple of years ago after observing that 2 cameras emitting h264 in HD at 30fps was more than a 3GHz Core 2 Duo CPU could comfortably handle.
I had a brief look into the VDPAU API and if I recall correctly it was possible to hardware accelerate h264 decoding into a buffer. It would also be useful to perform motion detection with CUDA or OpenCL. This left two questions for me: 1) is h264 decompression or motion detection a greater load on the CPU? 2) Is it possible to combine VDPAU and OpenCL?
There was also a 3rd question...will I change jobs any time in the near future and get a chance to look at this imbetween :)
Can we ad $ to this bounty? Something we would like to see as well.
Feel free. Click on any of the bountysource links in this thread to go to bountysource where you can do just that.
Just adding my notes, and tests. I hope this can help some one connect all the dots and give us hardware acceleration. https://forums.zoneminder.com/viewtopic.php?f=36&t=25899
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still no vaapi (by ffmpeg) supported? Shame :)
How big does the bounty have to be? seriously.
Ran out of CPU resources. Had to move over to BlueIris, which has great hardware acceleration. I have been VERY pleased with the move.
On Wed, Jan 30, 2019 at 11:00 AM mfu-mcosys notifications@github.com wrote:
Still no vaapi (by ffmpeg) supported? Shame :)
How big does the bounty have to be? seriously.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ZoneMinder/zoneminder/issues/209#issuecomment-459066238, or mute the thread https://github.com/notifications/unsubscribe-auth/Ab3CGq4L6MpcvFQENx9tYcBCzz3Zd3XGks5vIeu-gaJpZM4BFNMY .
Master now contains support for hwaccels in decoding. I have tested vaapi and cuda on intel, nvidia and amd chipsets. All work well.
thx, will test it later :)
Hm, seems i miss something?
I just compiled Zoneminder and, when enabling cuda on an NVIDIA-card, I'm only getting "Unable to create conversion context for rtsp://insertcameraurlhere from cuda to rgba"
H/W-acceleration works fine from command-line with ffmpeg, so I'm not sure what's going on.
Would @connortechnology shed some info on this?
The cuda format is what you get from the hw decoder, we then transfer it from the gpu to another frame that is the nv12 format. So that line tells us that the hwtransfer step didn't happen. This would happen if something went wrong when selecting the format... there should be a line at debug level 1 starting with Selected gw_pix_fmt that tells us what is expected, which should be cuda.
@connortechnology Well, I did pastebin the portion of the log that I imagine is the most relevant at https://pastebin.com/duY7fVPW
In zoneminder I have set DecoderHWAccelName as "cuda" and DecoderHwAccelDevice as "0" (without quotes). The ffmpeg I am using is the same as in Ubuntu 19.04's repos, except I rebuilt it with the additional flags of "--enable-libnpp --enable-opencl --enable-nonfree --enable-libnpp --enable-libmfx --enable-nvenc --enable-cuda" which, to my understanding, should be enough. At least it works with everything else I've thrown at it just fine, including hardware-accelerated encoding and decoding.
I do not know what else I should be mentioning that might be relevant here.
Um, don't put 0 in the Device. Just leave it blank. This is all new, so I'm sure there is a lot to learn. Will have to write a howto
@connortechnology I did try that, didn't change anything.
Well how about a debug level 3 zmc log?
@connortechnology Take a look at https://pastebin.com/sUsCK9jb
I'm not particularly familiar with libavcodec or zoneminder's codebase, but at a glance the only thing that looks relevant would be 2019-08-20 01:46:00.693885 zmc_m1[6274].WAR-zm_ffmpeg.cpp/67 [cuda is not supported as input pixel format
Please update to the latest. I don't see lines in that pastebin that I would expect to see.
@connortechnology I just compiled latest source, not much changed. "Selected gw_pix_fmt 119 cuda" now reads "Selected hw_pix_fmt 119 cuda", but that's pretty much it -- it still says that cuda is not supported as input pixel format and that's where the whole thing seems to stop.
Ok that was dumb of me. I hadn't actually tested the non-passthrough case.
Fixed
I can confirm that it's working now, both with passthrough and not.
The performance-improvement is rather appalling, though -- going from ~41% CPU-usage to ~35% isn't really much to brag about. Looks like something else is constantly eating up CPU, even with an empty timestamp-string (didn't notice any change, does ZM insist on inserting an overlay even when there's nothing in it?), blending disabled, recording disabled and the camera in either nodect or monitor - mode. The decoding of incoming video seems a rather small part of the load, though even a small improvement is more than nothing.
Random observations (not necessarily even related to HW-accel): Using 32bit RGB with cuda-hwaccel is faster than 24bit by quite a large margin. Using 8bit with cuda-hwaccel greyscale is still faster than 32bit RGB. Might be a good idea to mention these cases in the documentation for any cuda-users. Logic would lead me to believe the same applies to other HW-accel methods, but I can't test. scale_cuda or scale_npp could possibly be used to speed up scaling of video/images when using cuda, scale_qsv or scale_vaapi on Intel and AMD. HW-encoding could possibly be useful in some places, though NVENC is only limited to two simultaneous ones. I believe QuickSync isn't and any limits come from simply how much workload there's on it, but my Xeon ain't got no iGPU, so I can't play around with any qsv-stuff.
@WereCatf
Thanks for keeping this thread going. I have also been looking to get Nvidia GPU support for ZM
I tried compiling by using
./do_debian_package.sh --snapshot=NOW --branch=master --type=local
but end up with this error at 79%
I have also compiled FFmpeg with the following options --enable-nvenc --enable-nonfree --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64
Do you have any ideas that could help ?
[ 79%] Linking CXX executable zma cd "/home/scotth/zoneminder_1.33.14~20190911231433.orig/dbuild/src" && /usr/bin/cmake -E cmake_link_script CMakeFiles/zma.dir/link.txt --verbose=1 /usr/bin/c++ -g -O2 -fdebug-prefix-map=/home/scotth/zoneminder_1.33.14~20190911231433.orig=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -std=c++11 -Wall -D__STDC_CONSTANT_MACROS -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -rdynamic CMakeFiles/zma.dir/zma.cpp.o -o zma -L"/home/scotth/zoneminder_1.33.14~20190911231433.orig/src/libbcrypt" -Wl,-rpath,"/home/scotth/zoneminder_1.33.14~20190911231433.orig/src/libbcrypt:" libzm.a -lrt -lz -lcurl -ljpeg -lssl -lcrypto -lpthread -lpcre -lgcrypt -lgnutls-openssl -lmysqlclient -lx264 -lmp4v2 /usr/local/lib/libavformat.a /usr/local/lib/libavcodec.a /usr/local/lib/libavdevice.a /usr/local/lib/libavutil.a /usr/local/lib/libswscale.a /usr/local/lib/libswresample.a -lvlc -ldl /usr/local/lib/libavutil.a(mathematics.o): In function
av_rescale_delta':
/home/scotth/ffmpeg/libavutil/mathematics.c:168: multiple definition of av_rescale_delta' libzm.a(zm_ffmpeg.cpp.o):./dbuild/src/./src/zm_ffmpeg.cpp:197: first defined here collect2: error: ld returned 1 exit status src/CMakeFiles/zma.dir/build.make:117: recipe for target 'src/zma' failed make[3]: *** [src/zma] Error 1 make[3]: Leaving directory '/home/scotth/zoneminder_1.33.14~20190911231433.orig/dbuild' CMakeFiles/Makefile2:350: recipe for target 'src/CMakeFiles/zma.dir/all' failed make[2]: *** [src/CMakeFiles/zma.dir/all] Error 2 make[2]: Leaving directory '/home/scotth/zoneminder_1.33.14~20190911231433.orig/dbuild' Makefile:154: recipe for target 'all' failed make[1]: *** [all] Error 2 make[1]: Leaving directory '/home/scotth/zoneminder_1.33.14~20190911231433.orig/dbuild' dh_auto_build: cd dbuild && make -j1 returned exit code 2 debian/rules:15: recipe for target 'build' failed make: *** [build] Error 2 dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2 debuild: fatal error at line 1152: dpkg-buildpackage -rfakeroot -us -uc -ui -i -b failed Error status code is: 0 Build failed.
Thanks again for your efforts so far.
I believe this issue might benefit from being split into multiple tasks(issues). What do you all think?
Feedback about the idea welcome.
For what it's worth... I'm running Zoneminder 1.33.15 built from source under Ubuntu 18.04 with a cheap GeForce GT 740 and two FullHD cameras. After setting DecoderHWAccelName to "cuda-hwaccel" for both cameras, I could clearly see that the server load went down, while the CPU usage pretty much stayed the same:
Bountysource is going to take the money in this bounty on July 1st unless the funds are redirected. Alternatively I can close this out, collect the bounty and ZoneMinder will hold the funds until it is properly closed.
Bountysource is going to take the money in this bounty on July 1st unless the funds are redirected. Alternatively I can close this out, collect the bounty and ZoneMinder will hold the funds until it is properly closed.
From my point: do what u want, but plz solve this problem if u get in mood :)
At long last I have put the infrastructure in place. 1.35.28 contains support for vaapi and nvenc for encoding. opencl will come soon.
This is really more a feature request/enhancement than a bug or issue:
eventually, Zoneminder should allow compilation with certain flags to permit utilization of GPU's to perform ffmpeg or jpeg transformations hence providing a much better performance than CPU while freing this one for other CPU related tasks (perl & php scripts, SQL database, etc)
I am not sure if this should be done at the component level (ffmpeg, libjpeg-turbo, etc) or at the top level (zm daemons)
There is a $50 open bounty on this issue. Add to the bounty at Bountysource.