dlandon / zoneminder.machine.learning

Zoneminder Docker
GNU General Public License v2.0
320 stars 143 forks source link

Add Nvidia GPU Support to Docker #65

Closed dlandon closed 3 years ago

dlandon commented 4 years ago

Add Nvidia GPU support to Docker.

dlandon commented 4 years ago

@sic79 The cuDNN has to be added. I've updated the opencv.sh compile script to handle installing that. There are instructions in the script on how to get the cuDNN package. Because it is not available without an Nvidia account, you have to sign up and download it manually. I will be generating a new docker today that changes the way ES is updated. I create a tar that is downloaded and ES is updated from that whenever the docker is started. It will no longer require a docker build to update ES. I can then issue updates to the compile script and it will update the docker when it is restarted. The opencv.sh script will be placed at /config for you to modify and execute. The newest one is copied to opencv.sh.default so your working script is not overwritten. You can then make your changes to the default script and execute that one.

I'm expecting some more changes to the opencv.sh script as we learn more about eachindividual need. I'm going to set up the script to be customized based on each user's specific GPU.

dlandon commented 4 years ago

@pliablepixels I'm working on adding the cuDNN to the script. I've downloaded and installed the cuDNN package, but I get this when I do the cmake: Could NOT find CUDNN (missing: CUDNN_LIBRARY CUDNN_INCLUDE_DIR) (Required is at least version "7.5") Looks like it is some path settings? Suggestions?

sic79 commented 4 years ago

@sic79 The cuDNN has to be added. I've updated the opencv.sh compile script to handle installing that. There are instructions in the script on how to get the cuDNN package. Because it is not available without an Nvidia account, you have to sign up and download it manually. I will be generating a new docker today that changes the way ES is updated. I create a tar that is downloaded and ES is updated from that whenever the docker is started. It will no longer require a docker build to update ES. I can then issue updates to the compile script and it will update the docker when it is restarted. The opencv.sh script will be placed at /config for you to modify and execute. The newest one is copied to opencv.sh.default so your working script is not overwritten. You can then make your changes to the default script and execute that one.

I'm expecting some more changes to the opencv.sh script as we learn more about eachindividual need. I'm going to set up the script to be customized based on each user's specific GPU.

Sounds like a pretty nice solution on this complex install, good work!. I´ll hold on a bit until you have the script ready for test. Then I´ll take it for a spin again. In the meantime I´ll create a nvidia dev account.

dlandon commented 4 years ago

I'm building a new docker to update ES without a docker rebuild. Once you update, you will see opencv.sh at /config. I have updated the script and added cuDNN, but I don't think it's being compiled in. As I make changes to the opencv script, I'll push them to GitHub and all you need to do is restart your docker to acquire the new script. Feel free to try it out, but like I said, I don't think cuDNN is working yet.

sic79 commented 4 years ago

@dlandon Tried the new docker build and I get the same error just as you thought.

sic79 commented 4 years ago

I saw now that you had commented the script. Gonna try with "libcudnn7_7.6.0.64-1%2Bcuda10.1_amd64.deb" in /config soon.

drtaul commented 4 years ago

I have mine running albeit with manually installing cudnn and opencv. I have tested by continuously monitoring nvidia-smi. This is on UNRAID v6.8.2, unraid-nvidia plugin with driver v440.40 on a Quadro M4000 (CC=5.2). Testing included building and running the tests provided with cudnn-doc samples.

I had to manually install opencv from github given the current v4.2.0 of opencv will not allow building opencv with cuda on a GPU of compute capability less than 5.3.

I reviewed your latest version of opencv.sh and I like what you are doing. My only suggestion would be to expand the capability to use local files similar for what you are doing for cudnn. This would be helpful in my case of building opencv pulled from tip.

dlandon commented 4 years ago

@drtaul Can you modify the script to make it work with your setup and I will then work on making it generic to cover other uses? Then post it for me?

drtaul commented 4 years ago

sure, more than happy to do that.

On Mon, Feb 17, 2020 at 1:07 PM dlandon notifications@github.com wrote:

@drtaul https://github.com/drtaul Can you modify the script to make it work with your setup and I will then work on making it generic to cover other uses? Then post it for me?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dlandon/zoneminder/issues/65?email_source=notifications&email_token=AIGUBIMW5FPVUMJPATQJZCTRDLG7BA5CNFSM4KVM2CUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL7IVEA#issuecomment-587106960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGUBIMLKJAGQCIBEC3YLSLRDLG7BANCNFSM4KVM2CUA .

sic79 commented 4 years ago

So, tried with libcudnn in /config now and got the same error as before, "switching to CPU". Let's wait and see what @drtaul comes up with.

dlandon commented 4 years ago

I'm building a new docker to fix the hook install with the new ES update scheme. I also updated the opencv.sh that will now compile cuDNN into opencv. Please give it a try. Remember that the /config/opencv.sh script will not be changed. The new script will be installed at opencv.sh.default. That's the one you will want to try.

drtaul commented 4 years ago

I made some changes to opencv.sh you can view at: https://github.com/drtaul/zoneminder

My first key test is to stop the script and look at the report from the cmake command. This clearly states whether cuda and cudnn will be built or not. Only by adding the libcudnn...-dev package could I get it to report that it will build cudnn.

Currently I am running a pass after commenting out the last part of the script that does a clean up by removing the -dev packages. This seems to be breaking the opencv install given the cv2.getBuildInformation() reports no cuda nor cudnn.

On Mon, Feb 17, 2020 at 7:10 PM dlandon notifications@github.com wrote:

I'm building a new docker to fix the hook install with the new ES update scheme. I also updated the opencv.sh that will now compile cuDNN into opencv. Please give it a try. Remember that the /config/opencv.sh script will not be changed. The new script will be installed at opencv.sh.default. That's the one you will want to try.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dlandon/zoneminder/issues/65?email_source=notifications&email_token=AIGUBIIKCFTKYAFCTSHPUUDRDMRPLA5CNFSM4KVM2CUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMAB5SI#issuecomment-587210441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGUBILFF42CDL7SSMACPBLRDMRPLANCNFSM4KVM2CUA .

dlandon commented 4 years ago

I've already worked out the cuDNN dev package issue. Cmake reports both cuda and cuDNN enabled. I am unsure of the dev packages being needed. I assumed they were for compiling only.

There is a new script uploaded.

dlandon commented 4 years ago

I'm going to change the script to not uninstall anything. I guess all that stuff needs to stay and is not just needed for compiling.

sic79 commented 4 years ago

I've already worked out the cuDNN dev package issue. Cmake reports both cuda and cuDNN enabled. I am unsure of the dev packages being needed. I assumed they were for compiling only.

There is a new script uploaded.

Tried the new script now and still got "/io/opencv/modules/dnn/src/dnn.cpp (1363) setUpNet DNN module was not built with CUDA backend; switching to CPU". I have downloaded the files needed and updated their filenames in the script but still no difference.. Think I will have to wait for someone else to test on their server since I don´t know why it won´t work on mine.

Edit: I did a rerun of the script again and saved the output to a log. https://filebin.net/ud91g05fintibgzu

Under "General configuration for OpenCV 4.2.0 " it says: NVIDIA CUDA: NO

cuDNN: NO

Could that be to any help for you?

dlandon commented 4 years ago

@sic79 Yes that is some help. Try the script I just uploaded. Be sure you have the correct cuDNN packages for your GPU version. Also, there is a setting in the cmake that I had to add to get mine to work, but I don't have a GPU card. This is the setting: -D CUDA_ARCH_BIN=7.5 \ It needs to be set to your GPU version I think.

Place an exit in the script just before the make and see if the cmake shows CUDA and cuDNN as 'YES'.

dlandon commented 4 years ago

@pliablepixels I think the ball is in your court. I've done all I can do.

pliablepixels commented 4 years ago

Yup no worries .I'll work on it today.

drtaul commented 4 years ago

@dlandon/zoneminder zoneminder@noreply.github.com My build last night was successful. Ran a quick set of tests this morning to confirm. These are from the cudnn7 samples in the doc package. Also the cv2.getBuildInformation() reports CUDA and CUDNN built in. This is using the opencv.sh script I just pushed to my forked dlandon/zoneminder github. As I indicated I commented out the cleanup code at the end and also the pip3 install.

BTW, pliablepixels has indicated that the -D CUDA_ARCH_BIN=7.5 is not needed and I am able to confirm.

Finally, I am using a later unreleased version of opencv to enable me to use my 5.2 card.

On Tue, Feb 18, 2020 at 7:11 AM Pliable Pixels notifications@github.com wrote:

Yup no worries .I'll work on it today.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dlandon/zoneminder/issues/65?email_source=notifications&email_token=AIGUBIOEPUNWM6YKAE6PTULRDPGARA5CNFSM4KVM2CUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMBXYLI#issuecomment-587430957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGUBIOB2OHQFTBBC4QKXYDRDPGARANCNFSM4KVM2CUA .

drtaul commented 4 years ago

Forgot to attach my getbldinfo from cv2.

Also note that ffmpeg is not build? Seems I have seen it build in the past but then I have done too many one offs to be sure.

On Tue, Feb 18, 2020 at 7:52 AM Dennis Taul dennis.taul@gmail.com wrote:

@dlandon/zoneminder zoneminder@noreply.github.com My build last night was successful. Ran a quick set of tests this morning to confirm. These are from the cudnn7 samples in the doc package. Also the cv2.getBuildInformation() reports CUDA and CUDNN built in. This is using the opencv.sh script I just pushed to my forked dlandon/zoneminder github. As I indicated I commented out the cleanup code at the end and also the pip3 install.

BTW, pliablepixels has indicated that the -D CUDA_ARCH_BIN=7.5 is not needed and I am able to confirm.

Finally, I am using a later unreleased version of opencv to enable me to use my 5.2 card.

On Tue, Feb 18, 2020 at 7:11 AM Pliable Pixels notifications@github.com wrote:

Yup no worries .I'll work on it today.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dlandon/zoneminder/issues/65?email_source=notifications&email_token=AIGUBIOEPUNWM6YKAE6PTULRDPGARA5CNFSM4KVM2CUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMBXYLI#issuecomment-587430957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGUBIOB2OHQFTBBC4QKXYDRDPGARANCNFSM4KVM2CUA .

oot@f83194403b11:/config# python3 Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cv2 print(cv2.getBuildInformation())

General configuration for OpenCV 4.2.0-dev ===================================== Version control: unknown

Extra modules: Location (extra): /root/opencv_contrib/modules Version control (extra): unknown

Platform: Timestamp: 2020-02-18T03:21:43Z Host: Linux 4.19.98-Unraid x86_64 CMake: 3.10.2 CMake generator: Unix Makefiles CMake build tool: /usr/bin/make Configuration: RELEASE

CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (16 files): + SSSE3 SSE4_1 SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (29 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX512_SKX (6 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

C/C++: Built as dynamic libs?: YES C++ Compiler: /usr/bin/c++ (ver 7.4.0) C++ flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG C++ flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG C Compiler: /usr/bin/cc C flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG C flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG Linker flags (Release): -Wl,--gc-sections
Linker flags (Debug): -Wl,--gc-sections
ccache: NO Precompiled headers: NO Extra dependencies: m pthread cudart_static -lpthread dl rt nppc nppial nppicc nppicom nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu 3rdparty dependencies:

OpenCV modules: To be built: aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc line_descriptor ml objdetect optflow phase_unwrapping photo plot python3 quality reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab xfeatures2d ximgproc xobjdetect xphoto Disabled: world Disabled by dependency: - Unavailable: cnn_3dobj cvv freetype hdf java js matlab ovis python2 sfm viz Applications: tests perf_tests apps Documentation: NO Non-free algorithms: YES

GUI: GTK+: NO VTK support: NO

Media I/O: ZLib: build (ver 1.2.11) JPEG: libjpeg-turbo (ver 2.0.2-62) WEBP: build (ver encoder: 0x020e) PNG: build (ver 1.6.37) TIFF: build (ver 42 - 4.0.10) JPEG 2000: build (ver 1.900.1) OpenEXR: build (ver 2.3.0) HDR: YES SUNRASTER: YES PXM: YES PFM: YES

Video I/O: DC1394: NO FFMPEG: NO avcodec: NO avformat: NO avutil: NO swscale: NO avresample: NO GStreamer: NO v4l/v4l2: YES (linux/videodev2.h)

Parallel framework: pthreads

Trace: YES (with Intel ITT)

Other third-party libraries: Intel IPP: 2019.0.0 Gold [2019.0.0] at: /root/opencv/build/3rdparty/ippicv/ippicv_lnx/icv Intel IPP IW: sources (2019.0.0) at: /root/opencv/build/3rdparty/ippicv/ippicv_lnx/iw Lapack: NO Eigen: NO Custom HAL: NO Protobuf: build (3.5.1)

NVIDIA CUDA: YES (ver 10.1, CUFFT CUBLAS FAST_MATH) NVIDIA GPU arch: 30 35 37 50 52 60 61 70 75 NVIDIA PTX archs:

cuDNN: YES (ver 7.6.5)

OpenCL: YES (no extra features) Include path: /root/opencv/3rdparty/include/opencl/1.2 Link libraries: Dynamic load

Python 3: Interpreter: /usr/bin/python3 (ver 3.6.9) Libraries: /usr/lib/x86_64-linux-gnu/libpython3.6m.so (ver 3.6.9) numpy: /usr/local/lib/python3.6/dist-packages/numpy/core/include (ver 1.18.1) install path: lib/python3.6/dist-packages/cv2/python-3.6

Python (for build): /usr/bin/python3

Java:
ant: NO JNI: NO Java wrappers: NO Java tests: NO

Install to: /usr/local

sic79 commented 4 years ago

@dlandon OK, I tried again and it ended with the same issue, CUDA and cuDNN being NO.

It also complained about "-D CUDA_ARCH_BIN".. I put 6.1 there as it seems right for my GPU according to this page https://developer.nvidia.com/cuda-gpus (My GPU is a Quadro P4000)

Here is the end from the cmake: CMake Warning: Manually-specified variables were not used by the project:

CUDA_ARCH_BIN

Build files have been written to: /root/opencv/build

dlandon commented 4 years ago

The CUDA_ARCH_BIN doesn't appear to be necessary. You can remove it. It doesn't affect the outcome if your card is detected automatically.

@sic79 You probably have an issue with the cuda and cuDNN version for your GPU. If cmake is saying that CUDA_ARCH_BIN is not necessary, it is detecting your card. Take another look at the versions again.

@drtaul I find it hard to believe that some of the clean up can't be done. I think the cuda once opencv is compiled can be removed. You are free to not do that step but the image ends up being about 10GB. Once cuda is removed, the image goes to about 5GB. I know you probably don't care as long is everything works, but that is an incredible waste of docker image space. The installed libraries might have to stay.

pliablepixels commented 4 years ago

@dlandon do you need me to look at anything? I was out all day. Looks like from the thread above with @drtaul 's inputs, we have a working version?

One note:

  1. We should not install face recognition before the GPU process. This is because as part of face_recognition, it also sets up dlib face recognition from source that looks for GPUs and disables it if not found. Therefore, we ideally defer the face recognition install till after all the cuda stuff is done, or we remove it and reinstall it. (face recognition take a lot of time, as you know)
drtaul commented 4 years ago

@dlandon I don't mean to suggest that cleanup should not be done. I tend to use an iterative approach on this kind of thing, so my initial shot was to comment out everything. I understand the desire to reduce the footprint and appreciate the intent. My only conclusion is there something is the cleanup and/or pip3 install is causing the new cv2 to break. I will try and do another iteration later today while also looking at pliablepixels note on when to install face recognition.

LMK if there is anything specific you would like me to try/test... and thanks again.

dlandon commented 4 years ago

I just pushed a new script. I adjusted it to what @drtaul says is working, I left the cleanup at the end. The rm -r opencv* just removes the opencv source after compiling and should have no impact on the compiled opencv. I agree with the iterative approach to getting issues resolved, Run the cleanup one line at a time and see if the opencv gets broken. You don't have to recompile, My suspicion is the pip3 install... line was the issue.

I also added some pauses in the compile process so a user can check if things are working properly. This will help them find and correct issues as they go along, rather than having to run the compile and then find it is broken.

@pliablepixels I think we are in our way to getting a solid working script.

drtaul commented 4 years ago

Just finished another build and test with success. I sync'd up with @dlandon changes made of this AM, but not this last set. I moved to cuda 10.2 since that is what is setup on my UNRAID box i.e. 10.2 + v440 driver. After initial verification, I then proceeded to walk through the cleanup lines. They do not break the tests I am running. I will note that for each apt remove I get a warning like

You might want to run 'apt --fix-broken install' to correct these.

I did make a change to add the pin for the local repo because I thought there was an error in setting it up but I think it was only this warning:

Setting up cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01 (1.0-1) ...

The public CUDA GPG key does not appear to be installed.

I also added a pip3 uninstall face-recognition and reinstall after the make-install. I pushed my most recent opencv.sh to my github.

Cheers

dlandon commented 4 years ago

Setting up cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01 (1.0-1) ...

The public CUDA GPG key does not appear to be installed.

Correct, this is just a warning that occurs when run in a docker. Doesn't mean anything.

I'll incorporate your changes,

drtaul commented 4 years ago

@dlandon The nvidia-smi -l command loops until cntrl-c? I suspect that is not your intent? I can only terminate the opencv.sh script to exit out of nvidia-smi.

dlandon commented 4 years ago

No it should not loop. Worked for me but it fails because I don't have a GPU. Let me check,

dlandon commented 4 years ago

Found it. I did the -l instead of -L. The -l does a loop.

dlandon commented 4 years ago

Just pushed a new openccv.sh script. @sic79 Give the new script a try. I've added some pauses in the process so you can evaluate your progress and take measures to fix issues rather than just compiling and trying it out not knowing if it is going to work,

sic79 commented 4 years ago

@dlandon Here is some logs from my test. I copied the errors I saw in the log en excluded the others when the stops occured. I stopped the install at the second stop,.

First stop: Errors were encountered while processing: /tmp/apt-dpkg-install-o4aFj9/132-nvidia-compute-utils-440_440.33.01-0ubuntu1_amd64.deb /tmp/apt-dpkg-install-o4aFj9/137-nvidia-utils-440_440.33.01-0ubuntu1_amd64.deb E: Sub-process /usr/bin/dpkg returned an error code (1) ##################################################################################

GPU 0: Quadro P4000 (UUID: GPU-f5f3f253-1643-9f17-93aa-a8dd0c127154) ################################################################################## Verify your Nvidia GPU is seen. If not stop the script and fix the problem. Press any key to continue, or ctrl-C to stop.

Second stop:


Found PythonInterp: /usr/bin/python3 (found suitable version "3.6.9", minimum required is "2.7") CMake Warning at cmake/OpenCVDetectPython.cmake:81 (message): CMake's 'find_host_package(PythonInterp 2.7)' founds wrong Python version:

PYTHON_EXECUTABLE=/usr/bin/python3

PYTHON_VERSION_STRING=3.6.9

Consider specify 'PYTHON2_EXECUTABLE' variable via CMake command line or environment variables

Call Stack (most recent call first): cmake/OpenCVDetectPython.cmake:271 (find_python) CMakeLists.txt:585 (include)

Consider using CMake 3.12+ for better Python support Found PythonInterp: /usr/bin/python3 (found suitable version "3.6.9", minimum required is "3.2") Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.6m.so (found suitable exact version "3.6.9")


And some more Errors before the stop


Registering hook 'INIT_MODULE_SOURCES_opencv_dnn': /root/opencv/modules/dnn/cmake/hooks/INIT_MODULE_SOURCES_opencv_dnn.cmake CMake Error at modules/dnn/CMakeLists.txt:99 (message): CUDA backend for DNN module requires CC 5.3 or higher. Please remove unsupported architectures from CUDA_ARCH_BIN option.

Configuring incomplete, errors occurred! See also "/root/opencv/build/CMakeFiles/CMakeOutput.log". See also "/root/opencv/build/CMakeFiles/CMakeError.log". ###################################################################################### Verify that CUDA and cuDNN are both enabled in the cmake output above. Look for the lines with CUDA and cuDNN. You may have to scroll up the page to see them. If those lines don't show 'YES', then stop the script and fix the problem. Check that you have the correct versions of CUDA ond cuDNN for your GPU. Press any key to continue, or ctrl-C to stop.


There is also some nagging to fix broken install on the 440 nvidia driver, see below:


You might want to run 'apt --fix-broken install' to correct these. The following packages have unmet dependencies: cuda-drivers : Depends: nvidia-compute-utils-440 (>= 440.33.01) but it is not installed Depends: nvidia-utils-440 (>= 440.33.01) but it is not installed nvidia-driver-440 : Depends: nvidia-compute-utils-440 (= 440.33.01-0ubuntu1) but it is not installed Depends: nvidia-utils-440 (= 440.33.01-0ubuntu1) but it is not installed Recommends: libnvidia-compute-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-decode-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-encode-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-ifr1-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-fbc1-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-gl-440:i386 (= 440.33.01-0ubuntu1) but it is not installable


Hope the logs can help to troubleshoot

sic79 commented 4 years ago

Can add that this happens if I try to Fix broken install:


apt --fix-broken install Reading package lists... Done Building dependency tree
Reading state information... Done Correcting dependencies... Done The following additional packages will be installed: nvidia-compute-utils-440 nvidia-utils-440 The following NEW packages will be installed: nvidia-compute-utils-440 nvidia-utils-440 0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded. 200 not fully installed or removed. Need to get 0 B/418 kB of archives. After this operation, 1298 kB of additional disk space will be used. Do you want to continue? [Y/n] y Get:1 file:/var/cuda-repo-10-2-local-10.2.89-440.33.01 nvidia-compute-utils-440 440.33.01-0ubuntu1 [73.8 kB] Get:2 file:/var/cuda-repo-10-2-local-10.2.89-440.33.01 nvidia-utils-440 440.33.01-0ubuntu1 [345 kB] (Reading database ... 108851 files and directories currently installed.) Preparing to unpack .../nvidia-compute-utils-440_440.33.01-0ubuntu1_amd64.deb ... Unpacking nvidia-compute-utils-440 (440.33.01-0ubuntu1) ... dpkg: error processing archive /var/cuda-repo-10-2-local-10.2.89-440.33.01/./nvidia-compute-utils-440_440.33.01-0ubuntu1_amd64.deb (--unpack): unable to make backup link of './usr/bin/nvidia-cuda-mps-control' before installing new version: Invalid cross-device link Preparing to unpack .../nvidia-utils-440_440.33.01-0ubuntu1_amd64.deb ... Unpacking nvidia-utils-440 (440.33.01-0ubuntu1) ... dpkg: error processing archive /var/cuda-repo-10-2-local-10.2.89-440.33.01/./nvidia-utils-440_440.33.01-0ubuntu1_amd64.deb (--unpack): unable to make backup link of './usr/bin/nvidia-debugdump' before installing new version: Invalid cross-device link dpkg-deb: error: paste subprocess was killed by signal (Broken pipe) Errors were encountered while processing: /var/cuda-repo-10-2-local-10.2.89-440.33.01/./nvidia-compute-utils-440_440.33.01-0ubuntu1_amd64.deb /var/cuda-repo-10-2-local-10.2.89-440.33.01/./nvidia-utils-440_440.33.01-0ubuntu1_amd64.deb E: Sub-process /usr/bin/dpkg returned an error code (1)


sic79 commented 4 years ago

@dlandon Ok, I came a bit further now. I added -D CUDA_ARCH_BIN=6.1 \ to cmake and i got the following:


NVIDIA CUDA: YES (ver 10.2, CUFFT CUBLAS FAST_MATH) NVIDIA GPU arch: 61 cuDNN: YES (ver 7.6.5)


So it seems you must have the "CUDA_ARCH_BIN" included to make cmake happy, atleast on my GPU.

BUT in the end it fails with the Cuda-drivers dependecies error:


You might want to run 'apt --fix-broken install' to correct these. The following packages have unmet dependencies: cuda-drivers : Depends: nvidia-compute-utils-440 (>= 440.33.01) but it is not installed Depends: nvidia-utils-440 (>= 440.33.01) but it is not installed nvidia-driver-440 : Depends: nvidia-compute-utils-440 (= 440.33.01-0ubuntu1) but it is not installed Depends: nvidia-utils-440 (= 440.33.01-0ubuntu1) but it is not installed Recommends: libnvidia-compute-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-decode-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-encode-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-ifr1-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-fbc1-440:i386 (= 440.33.01-0ubuntu1) but it is not installable Recommends: libnvidia-gl-440:i386 (= 440.33.01-0ubuntu1) but it is not installable


So I think that is the remaining issue here to solve.

dlandon commented 4 years ago

@sic79 It looks to me like your Unraid Nvidia is not set correctly. The nvidia-smi is not showing the driver loaded. You should see something like this: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 105... Off | 00000000:02:00.0 Off | N/A | | 37% 35C P8 N/A / 75W | 3366MiB / 4039MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| +-----------------------------------------------------------------------------+

That is why the CUDA_ARCH_BIN is required in your case. The driver is not found.

sic79 commented 4 years ago

This is what I see on nvidia-smi

nvidia-smi Wed Feb 19 11:13:30 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro P4000 Off | 00000000:03:00.0 Off | N/A | | 48% 44C P0 27W / 105W | 10MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

dlandon commented 4 years ago

Ok. I need to change the script to show this.

dlandon commented 4 years ago

@sic79 Looks like your issue starts here: First stop: Errors were encountered while processing: /tmp/apt-dpkg-install-o4aFj9/132-nvidia-compute-utils-440_440.33.01-0ubuntu1_amd64.deb /tmp/apt-dpkg-install-o4aFj9/137-nvidia-utils-440_440.33.01-0ubuntu1_amd64.deb E: Sub-process /usr/bin/dpkg returned an error code (1)

I don't understand why this is happening. Do you have enough memory in your system? Have you over committed SHM?

sic79 commented 4 years ago

@dlandon Yes, it should be enough with RAM. There is around 29GB free of 64GB right now on the server. The SHM setting is default, and I´m not sure how it is used.

dlandon commented 4 years ago

The SHM setting is a percentage of total system memory. That is probably too much in your case. At the top of the Zoneminder GUI you'll see /dev/shm. Try to keep that in the 40-50% range with all your cameras running.

sic79 commented 4 years ago

@dlandon Ok, thanks for the tip :). But I assume this is not the culprint in my case regarding the error I get.

sic79 commented 4 years ago

When I look at the error above I see something that might give a clue. I have "Driver Version: 440.44" according to nvidia-smi but the dpkg error says "440.33", could that cause the dpkg issue?

dlandon commented 4 years ago

@sic79 Yes. I think it means you have the wrong version of cuda. Are you downloading the 10.2 version?

dlandon commented 4 years ago

@pliablepixels I see this when I try to import cv2: Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import cv2 Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.6/dist-packages/cv2/init.py", line 96, in bootstrap() File "/usr/local/lib/python3.6/dist-packages/cv2/init.py", line 86, in bootstrap import cv2 ImportError: libcublas.so.10: cannot open shared object file: No such file or directory

Isn't cublas something to do with Tensor Flow. Do I need a conditional cmake if TF is to be installed?

sic79 commented 4 years ago

@sic79 Yes. I think it means you have the wrong version of cuda. Are you downloading the 10.2 version?

Yes, it is 10.2. I saw that you had that by default in the script now also so I did not change enything.

For the record, the files I use: cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb libcudnn7-dev_7.6.5.32-1+cuda10.2_amd64.deb

dlandon commented 4 years ago

What version of Unraid are you using?

sic79 commented 4 years ago

@dlandon Is it possible to downgrade the nvidia version in the docker or upgrade that dpkg package in the install some way?

sic79 commented 4 years ago

What version of Unraid are you using?

6.8.2 and the nvidia plugin for that version

dlandon commented 4 years ago

@sic79 Not sure what you are asking about nvidia version or dpkg. You need to 10.2 version with Unraid Nvidia.

sic79 commented 4 years ago

I talk about the nvidia 440.44 version that is there by default in the docker. The dpkg error says 440.33, it is a mismatch there. It has some dependencies for the cuda it seems. "cuda-drivers : Depends: nvidia-compute-utils-440 (>= 440.33.01) but it is not installed"