Support deprecated PTX architectures in CUDA builds

sundeepgoel72 commented 1 year ago

The current CUDA version does not support older NVIDIA GPU (i have a GTX 750 TI).

Could support for the same be added. I would be happy to try and build the required libraries / dll - but will need some guidance along the way .

Refer https://github.com/Breakthrough/DVR-Scan/issues/94 for additional details

Q: What kind of performance uplift should i expect over the current 300FPS am able to achieve using MOG (with downsampling, and skipping 4 frame) without CUDA ? This will help to determine if investing effort in creating dll / library is good value, or would it be better to invest in a new GPU.

Breakthrough commented 1 year ago

For anyone that wants to give this a try, I had followed James Bowley's guide here: https://jamesbowley.co.uk/build-opencv-with-cuda-in-windows/ Note that compiling OpenCV + CUDA support may take several hours, and will require tens of gigabytes of space due to the number of dependencies and temporary files required.

Regarding performance, I still have yet to do any exhaustive benchmarks, but a quick test on my system (GTX 2070) using a 1080p video yields:

Subtractor	Downsampling	Frame Skip	Processing Speed (Effective)	Speedup
MOG2	None	None	60
MOG2_CUDA	None	None	140	2.33x
MOG2	2	2	330
MOG2_CUDA	2	2	460	1.4x

Using CUDA with DVR-Scan is not particularily well optimized yet (see #12 for some background). Typically I see less than 10% GPU utilization, so it will likely perform just as well on older GPU models.

As per #12, there are some significant technical issues limiting better GPU utilization, much of which I suspect comes down to using Python to interact with OpenCV/CUDA via FFI/cytpes. There are also some opportunities to use the GPU for video decoding as well, but I am not sure how robust the GPU decoder is given the variety of video formats most users have.

sundeepgoel72 commented 1 year ago

I think i may have managed to get openCV working with CUDA using binaries from https://jamesbowley.co.uk/downloads/#OpenCV4.5.0.

A further install of cudnn was needed, which was done from here : https://developer.nvidia.com/rdp/cudnn-download

Am able to run most of the pre-built exes, with GTC 750 Ti being detected, and GPU usage spiking on execution.

How do I get DVR-Scan to use this version to open-cv ?

Breakthrough commented 1 year ago

First, uninstall any version of DVR-Scan and install it using pip (e.g. pip install dvr_scan). Next, ensure you have the OpenCV Python module installed, e.g. python -c "import cv2" should execute without failing. Lastly, ensure you built OpenCV correctly, and that -b MOG2_CUDA will be supported:

python -c "import dvr_scan; assert dvr_scan.motion_detector.MotionDetectorCudaMOG2.is_available()"

If those commands execute correctly, then you can run dvr-scan with -b MOG2_CUDA. Hope this helps!

sundeepgoel72 commented 1 year ago

I validated the opencv cuda install by running opencv_perf_cudaarithm.exe --gtest_filter=Sz_Type_Flags_GEMM.GEMM/29 as per validation step on https://jamesbowley.co.uk/accelerate-opencv-4-5-0-on-windows-build-with-cuda-and-python-bindings/#cuda_performance

result Device 0: "NVIDIA GeForce GTX 750 Ti" CUDA Driver Version / Runtime Version 11.70 / 11.10 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 2048 MBytes (2147221504 bytes) GPU Clock Speed: 1.11 GHz

.....................

Compute Mode: Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.70, CUDA Runtime Version = 11.10, NumDevs = 1

Basis above, the openCV with CUDA seems to be properly installed and running fine.

I also did the other steps for the python biding as per : https://jamesbowley.co.uk/accelerate-opencv-4-5-0-on-windows-build-with-cuda-and-python-bindings/#python_bindings, however, suspect am missing a step to install the bindings,

I have python 3.10 installed on windows, i did have a old non cuda version of cv2 install which i removed. Post this copied cv2.cp38-win_amd64.pyd to C:\Users\sunde\AppData\Roaming\Python\Python310\site-packages (as per step 5 of the guide), however, keep getting a ModuleNotFoundError: No module named 'cv2' error.

I think the issue is around copying the right file to the right site-package folders for my python install. Seems the pyd file is not getting recognised as a cv2 package. Any advise on how to resolve ?

Breakthrough commented 1 year ago

When you compile a Python module, it must match the CPython runtime. The filename of the module you have, cv2.cp38-win_amd64.pyd, is for Python 3.8 (cp38) on Windows 64-bit (win_amd64). Try using Python 3.8 x64 instead and see if that resolves the issue.

sundeepgoel72 commented 1 year ago

Try using Python 3.8 x64 instead and see if that resolves the issue.

uninstalled 3.10, and did a fresh install of 3.8. Now getting following error after copying the pyd to the site-packages folder:- Traceback (most recent call last): File "", line 1, in ImportError: DLL load failed while importing cv2: The specified module could not be found. At least the error has changed :-)

sundeepgoel72 commented 1 year ago

Tried many so called "solutions" from the net, nothing worked !!! :-( Now trying using anaconda since most examples of a working version seem to be using it.

sundeepgoel72 commented 1 year ago

Tried many so called "solutions" from the net, nothing worked !!! :-( Now trying using anaconda since most examples of a working version seem to be using it.

Voila, finally it WORKED!!!!
Fix was to

install anaconda
create environment for version 3.8
the cv2.cp38-win_amd64.pyd into site-packages directory of the 3.8 environment
install dvr-scan

its finally working, a bit disappoint though. I was expecting much improved performance - but seems slower than the normal python install. MOG2 on normal python getting slightly better FPS vs with MOG2_CUDA using anaconda. probably a performance issue on the ananconda side, since it seems very "heavy".

More investigations tomorrow !!!1

sundeepgoel72 commented 1 year ago

Fix was to

install anaconda
create environment for version 3.8
the cv2.cp38-win_amd64.pyd into site-packages directory of the 3.8 environment
install dvr-scan

Breakthrough / DVR-Scan

Support deprecated PTX architectures in CUDA builds #97