Add Nvidia GPU Support to Docker

dlandon commented 4 years ago

Add Nvidia GPU support to Docker.

Document how to setup Zoneminder Docker to use the Nvidia Docker or Unraid plugin.
Develop script to compile opencv with CUDA support.
Determine if the compile of opencv is best done in a separate manually run script, or an environment variable to conditionally determine the use of opencv package or compiled opencv.

dlandon commented 4 years ago

I've posted the proposed opencv.sh script to compile opencv with GPU support. I'm thinking right now I favor the conditional installation of pip3 install opencv vs compiling the opencv in the background with the opencv.sh script. This would allow the docker to start quicker, but gpu support would not be available until the opencv.sh script has completed.

dlandon commented 4 years ago

I don't know how this is working. When I compile the opencv I get this error:

`2020-02-14 22:33:13 (57.1 MB/s) - ‘opencv_contrib.zip’ saved [62527980/62527980]

CMake Warning at cmake/OpenCVFindLibsPerf.cmake:35 (message):

OpenCV is not able to find/configure CUDA SDK (required by WITH_CUDA).

CUDA support will be disabled in OpenCV build.

To eliminate this warning remove WITH_CUDA=ON CMake configuration option. ` This is from the sample compile web page.

dlandon commented 4 years ago

So it seems that I need to install the cuda toolkit first. Giving that a try.

dlandon commented 4 years ago

To get the opencv to compile it took 15GB of space. Once It was done I've been able to get the docker back to 4GB. It will take a lot of disk/memory space to support this effort!

pliablepixels commented 4 years ago

Thanks for working on this. the entire CUDA installation process is ugly, and does not lead itself to automation, because to get all parts working, you have to go to NVIDIA's site and download architecture specific drivers that will differ from person to person. I documented my CUDA driver install process here for my specific GPU card

I do agree that you probably should not get into support questions on why GPU is not working. I don't either and completely leave it to the user to deal with GPU install issues. If they are not at a stage where nvidia-smi shows their GPU, they need to follow the 3rd party articles first.

On to your docker: I'm thinking of a middle ground that makes it simple for you. What if we state our goals as:

You download dlandon's docker, and if you want automated installed it does a CPU install of openCV
If you want GPU support, then: 2.a: There is some script in the image (or a howto) that will tell you the exact steps needed to get nvidia-docker installed 2.b: There is a flag in the docker script, which when enabled will attach the nvidia-docker runtime, so the dlandon docker will detect the GPU card 2.c: There is some script (or a how to) that will tell you the steps needed to install CUDA drivers inside your docker image to use that GPU 2.d: There is a script that will compile, install and replace the CPU version of openCV with the GPU version.
The docker update process will make sure these changes are not overwritten if you upgrade the docker

Steps 2.a, 2.c are the users' responsibility to do right. These are instructional only and support will not be provided.

Steps 2.b, 2.d, 3 are in purview of this repo to maintain/make sure they work (especially step 3 for me)

Thoughts?

dlandon commented 4 years ago

I'm thinking that for now I'll use an environment variable to not install opencv CPU - 'GPU_SUPPORT' when set to '1' will not install opencv CPU. Then the user can compile the opencv GPU.

I'll document how to set up the Zoneminder docker with the Nvidia support from the Unraid Nvidia plugin - it's not the Nvidia docker so it is a bit different. The nice thing is that it loads the driver for the GPU that can seen by the docker.

I'll put together an opencv script that can be modified for the user's specific situation that they can run to compile opencv.

I can't make the opencv compile persistent when the docker is updated. It's the nature of the beast.

And then wait for all the support that will happen!

dlandon commented 4 years ago

This doesn't make sense in your write up: download cuDNN from NVIDIA runtime. For my system it was

pliablepixels commented 4 years ago

Your comment seems truncated. but in general:

You need to install the cuda-driver (from ubuntu/debian repos)
You need the CuDNN runtime (comes from nvidia's website)

dlandon commented 4 years ago

Got it.
wget it and then install package?

dlandon commented 4 years ago

Can't download cuDNN. Says I need a membership.

pliablepixels commented 4 years ago

The membership is free. What I think is that to search for your correct download, you need a login but once you know the exact file to download, it's not authenticated.

pliablepixels commented 4 years ago

Based on what I am reading, if we use nvidia-docker then there is no need to install any of the drivers or the entire CuDNN library in the docker itself. I know you said unraid has its own plugin, but I wonder if we really need any of this if nvidia's docker provides the bridge. https://github.com/NVIDIA/nvidia-docker

Update 1: Just followed instructions on nvidia-docker and I had a container with GPU access. Am now going to compile openCV to see if it detects cuDnn. Note that I had to upgrade my docker version to 19.x

Update 2: NVIDIA provides its own cuda base images (https://hub.docker.com/r/nvidia/cuda/) -> it seems to me if you build your docker on top of those base images, you can completely skip CuDNN install yourself.

Update 3: Note that CUDA is installed since I used cuda:10.2-base

sudo docker run --gpus all nvidia/cuda:10.2-base nvidia-smi
Sat Feb 15 16:22:50 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:02:00.0 Off |                  N/A |
| 37%   35C    P8    N/A /  75W |   3366MiB /  4039MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Therefore, what we need to do in your docker image, is to start compiling OpenCV. CUDA+drivers will automatically be provided in this route. However, what I don't know is if you can have multiple FROMs in docker, where you use your current base image plus the cuda images or if there is another way.

dlandon commented 4 years ago

The Unraid Nvidia plugin has the drivers and cuda installed. I'm looking at that now.

dlandon commented 4 years ago

Ok, I think I have something that will work, I have the opencv.sh script ironed out and am doing some final testing, I do not have a Nvidia graphics card so I can't do any GPU testing, I've set up an environment variable 'GPU_SIPPORT' that when set to '1' will compile opencv in the background. The docker will start and run, but GPU support will not be available until the compile is complete. Log messages will show the progress of the compile. It will take a LONG time - over an hour and use a LOT of memory - about 15GB additional just to compile!

Unfortunately whenever the docker is updated, the opencv will have to be compiled again, This shouldn't be a major issue, as the only docker updates are for ES, Zoneminder is updated by restarting the docker and updating the Zoneminder package.

In Unraid, the Nvidia plugin will provide the drivers and cuda support. In other environments, the Nvidia docker can supply the drivers and cuda support, There are only a few parameters to adjust in the Zoneminder docker to enable the graphics card support. I'll document that.

pliablepixels commented 4 years ago

Sounds good. Look forward to your docs. I have a GPU and will test it out.

sic79 commented 4 years ago

I have a nvidia GPU in my unraid server so I will test when it is ready. And I don’t think the 15GB is a issue nowadays for most of the users. I like your choice to use the nvidia plugin so that other dockers also can share the GPU at the same time in the unraid server. Nice work!

dlandon commented 4 years ago

15GB isn't an issue. The Docker image will have to have enough free space to do the compile. It looks like the image will be about 4GB when done. Not as bad as I originally thought.

I'll start updating the GitHub, but won't create a new Docker image yet.

drtaul commented 4 years ago

I just noticed this ongoing work. I just completed a first pass at building opencv in docker on my unraid server with a Nvidia Quadra M4000. I am able to run a python test application and verify GPU is being used via nvidia-smi. Here is a link to my work FWIW: https://github.com/drtaul/docker-opencv

dlandon commented 4 years ago

What is your plan for the docker? Do you realize that there is already a Nvidia docker?

drtaul commented 4 years ago

Nvidia docker? Not unless you are referring to the plugin from linuxserverio that installs the drivers into UNRAID? Also I know the nvidia-docker is now deprecated since docker 19.03?

My plan? None really, just a learning exercise. I have been interacting with pliablepixels online recently as I have been building a home security system using zoneminder and home assistant (I leveraged his pyzm module to write an app for Appdaemon). Once I had ZM running via your docker (UNRAID server is a Dell-T7820 with dual Xeon CPU (16 threads) and a older Quadro M4000 GPU) I realized I wanted faster object detection (visitors were getting to my door before I received the alert, i.e. it is taking 20-30 seconds to complete the object detection processing).

After some searching and finally realizing the that python-opencv is only available without GPU acceleration I used the reference pliablepixels provided to just start an ad-hoc experiment. I guess I must have been working on it at the same time you were.

I have installed the 'Unraid Nvidia' plugin and have the drivers (v440) on UNRAID v6.8.2. With this I was able to observe via nvidia-smi command that my GPU was not being used 'ever'. As I indicated, with the docker I pointed you to, I can observe my GPU being loaded. (Though, currently I am not seeing any real improvement with vs without GPU).

BTW, thanks for all of your work!

On Sat, Feb 15, 2020 at 5:49 PM dlandon notifications@github.com wrote:

What is your plan for the docker? Do you realize that there is already a Nvidia docker?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dlandon/zoneminder/issues/65?email_source=notifications&email_token=AIGUBILAI6UXG6BSI6ALELLRDBWQVA5CNFSM4KVM2CUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL3YYWI#issuecomment-586648665, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGUBIMK6B6RO5GBU6LVQG3RDBWQVANCNFSM4KVM2CUA .

dlandon commented 4 years ago

Building the docker now.

sic79 commented 4 years ago

I installed the new docker and reconfigured all .ini files for the new ES version. But when I try to set of a motion alarm I get the following error:

02/16/20 12:49:48.133914 zmeventnotification[20568].INF [main:833] [|----> FORK:Inne_Server (7), eid:7612 Invoking hook on event start:'/var/lib/zmeventnotification/bin/zm_event_start.sh' 7612 7 "Inne_Server" "Motion All" "/var/cache/zoneminder/events/7/2020-02-16/7612"] Traceback (most recent call last): File "/var/lib/zmeventnotification/bin/zm_detect.py", line 8, in import cv2 ModuleNotFoundError: No module named 'cv2'.

Maybe @pliablepixels has any thoughts about that? Is it something i may have missed to configure..

dlandon commented 4 years ago

That's the opencv module. You did set the 'GPU_SUPPORT=1" and wait a very LONG time for it to compile. The log will show when the compile is completed.

dlandon commented 4 years ago

Also look at the /config/opencv.log file at the end and see if there were any compile errors.

drtaul commented 4 years ago

Great work! I will update with your latest docker image (I am two updates behind now). I will verify system is operating as before the update before proceeding with the GPU support setting. Probably later today once I do a backup (never backed up Zoneminder). Thanks for all of this!

sic79 commented 4 years ago

Yes the GPU_SUPPORT=1 is added and I saw opencv compile in the backgroundprocesses.

The last lines in my log is: tail /config/opencv.log Processing triggers for libc-bin (2.27-3ubuntu1) ... Processing triggers for ca-certificates (20180409) ... Updating certificates in /etc/ssl/certs... 0 added, 0 removed; done. Running hooks in /etc/ca-certificates/update.d...

updates of cacerts keystore disabled. done. Processing triggers for fontconfig (2.12.6-0ubuntu2) ... Processing triggers for mime-support (3.60ubuntu1) ...

So it doesn't look to be 100% finished by the log but the last lines has not changed last 50min (total running time of the docker is 2h). Maybe I should restart the docker and let it try again to compile.

sic79 commented 4 years ago

Here is the full log: https://filebin.net/c36vy7io4oj9iymw

This is the full error from zmeventnotification.pl Traceback (most recent call last): File "/var/lib/zmeventnotification/bin/zm_detect.py", line 8, in import cv2 File "/usr/local/lib/python3.6/dist-packages/cv2/init.py", line 96, in bootstrap() File "/usr/local/lib/python3.6/dist-packages/cv2/init.py", line 86, in bootstrap import cv2 ImportError: libgtk-3.so.0: cannot open shared object file: No such file or directory

dlandon commented 4 years ago

i get the same problem and it's worse after I compiled opencv because when I try to list the python modules I get a segfault.

pliablepixels will have to weigh in on this one,

Restarting the docker won't cause a recompile. It only happens once. You can go into the docker and run these commands: cd ~ ./opencv.sh

This will recompile the opencv.

sic79 commented 4 years ago

Ok, then ill try a recompile and see if there is a difference while we wait on @pliablepixels to check this out.

dlandon commented 4 years ago

I see an issue with the compile. I'm trying it out now. Don't waste your time recompiling.

sic79 commented 4 years ago

@dlandon to late, I already started a new compile ;). But I will cancel it and wait and see if you solves the compile issue

dlandon commented 4 years ago

I'm recompiling openccv with a correction to be sure the cv2 module is installed. I also changed some cmake parameters to: -D INSTALL_PYTHON_EXAMPLES=OFF \ -D INSTALL_C_EXAMPLES=OFF \ -D BUILD_EXAMPLES=OFF .. I don't think we need all the examples. We don't have a development environment here. Seems to cut down on the compile time. Will see.

dlandon commented 4 years ago

No luck with the import cv2 issue. Need pliablepixels input.

drtaul commented 4 years ago

I just kicked off a couple of attempts. First thing I notice is the cudnn SDK is missing based on grep -i cudnn /config/opencv.log -- Could NOT find CUDNN (missing: CUDNN_LIBRARY CUDNN_INCLUDE_DIR) (Required is at least version "7.5") I had to pull this separately after registering on the NVIDIA development site, not saying that is the only way, just the one i used.

I will start looking at the cv2 issue after my build completes... One other thought/question, I noticed you are configuring the Cmake command with:

-D PYTHON_EXECUTABLE=~/.virtualenvs/opencv_cuda/bin/python

I couldn't find this path in the container but then i am a noob in docker. Anyway, I did the same thing in my testing and had to create a sym link similar to what pysearchimage documented:

$ cd ~/.virtualenvs/opencv_cuda/lib/python3.5/site-packages/ $ ln -s /usr/local/lib/python3.5/site-packages/cv2/python-3.5/cv2.cpython-35m-x86_64-linux-gnu.so cv2.so

dlandon commented 4 years ago

Ok I may have an answer. Do this:

Remove the Zoneminder docker.
Add the docker back with the GPU_SUPPORT=0. Leave INSTALL_HOOK=1.
Let it finish the hook installation. Watch the log for completion.
Get into the docker cli.
pip3 uninstall opencv-contrib-python.
cd /config/
wget https://github.com/dlandon/zoneminder/blob/master/zmeventnotification/opencv.sh.
chmod +x opencv.sh.
./opencv.sh.

pliablepixels commented 4 years ago

No luck with the import cv2 issue. Need pliablepixels input.

Are you still having this issue? Looks like you might have fixed it (cv2 will be install in the packages for the executable in PYTHON_EXECUTABLE, so if your main python binary is different, it won't find it)

dlandon commented 4 years ago

I think I have it. I needed to install the wrapper.

dlandon commented 4 years ago

drtaul All of those issues should be solved. I thought the cuDNN was included in the Unraid plugin. I'll capture the cuDNN and install it. I'll probably grab all the Nvidia packages and host them on my GitHub just to make sure they are always available.

sic79 commented 4 years ago

Ok I may have an answer. Do this:

* Remove the Zoneminder docker.

* Add the docker back with the GPU_SUPPORT=0.  Leave INSTALL_HOOK=1.

* Let it finish the hook installation.  Watch the log for completion.

* Get into the docker cli.

* pip3 uninstall opencv-contrib-python.

* cd /config/

* wget https://github.com/dlandon/zoneminder/blob/master/zmeventnotification/opencv.sh.

* chmod +x opencv.sh.

* ./opencv.sh.

I´m gonna try this now. Will report back later

dlandon commented 4 years ago

I can't include CuDNN in the docker because of it being GPU and application specific, and licensing. I was sure Unraid Nvidia plugin had CuDNN. I'll look into it.

drtaul commented 4 years ago

My build finished, followed your instructions on running opencv.sh manually. Did a quick check in python by calling cv2.getBuildInformation()... attaching here. Also started ZM and triggered a capture which did fine... detected person + unknown face.

cv2-bldinfo.txt

dlandon commented 4 years ago

I'm backing off the compile of opencv by environment variable when hook is installed. Because of the complexities in this scheme and the fact I can't cleanly automate the whole process, it will be a manual process. I'll sort any issues you find in the opencv.sh script and provide it as a manually run script. The rest will have to be completed by the user manually.

sic79 commented 4 years ago

I have to come back tomorrow with results of my test, as I have no access to my server atm.

pliablepixels commented 4 years ago

@drtaul based on your note above, do you have Dan's docker image working with GPU+CUDA libraries? If so, did you use the nvidia docker images or compile on your own as far as cuda goes?

dlandon commented 4 years ago

He's on Unraid and using the Unraid plugin that provides the drivers and cuda,

drtaul commented 4 years ago

@pliablepixels totally based on dlandon's docker image and the manual intervention he asked me to test. If you note from the cv2.getBuildInformation, this lacks

cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect

This is due (I think based on my previous experiments) to not having the cudnn SDK? I will try this again by editing the opencv script to install the libcudnn7_7.6.0.64-1%2Bcuda10.1_amd64.deb which is limited to secure access (not a way to automate?) and I have a private copy. From what I observed, there is an include file (cudnn.h) and a lib (libcudnn.so) this is needed for the cmake in opencv to enable these features.

My question is whether your _zmdetectxx.py leverages this? What needs to be set in hook/objectconfig.ini to do so?

BTW, I am not sure what it means to say 'cuda' is included? It seems obvious to me the drivers provided in the Unraid plugin support the 'cuda APIs', but in order to compile code that leverages that capability you would need the cuda toolkit/sdk (typically installed under /usr/local/cuda) and additional features such as the cudnn library/api. Based on my quick perusal of my unraid environment, nvidia-smi is there and I can query the status of the gpu etc. ... there is no trace of the cuda tookit et. al.

pliablepixels commented 4 years ago

@drtaul From what I am reading the unraid plugin provides both the CUDA toolkit and the drivers. When you do nvidia-smi if you see "CUDA Version" on the top right (see my output above of nvidia-smi) that means your environment has both the driver + CUDA

My question is whether your zmdetect_xx.py leverages this? What needs to be set in hook/objectconfig.ini to do so?

Not sure I understood your question but if you are asking, if zmdetect leverages CUDA then yes. If OpenCV is compiled with CUDA then the GPU will be used (see this). If your environment doesn't have both GPU drivers + CUDA enabled, this will fail. To enable this set use_opencv_dnn_cuda=yes in objectconfig.ini

drtaul commented 4 years ago

@pliablepixels Ahh, thanks for the pointer on updating the objectconfig.ini. Another caveat I just realized is that the opencv being pulled limits use to GPU architecture >=5.3. My Matrox is 5.2. I found that on the tip of github for opencv is a commit to lower this limit to 3.0 FWIW. Consequently, my previous build did NOT build in the CUDA backend. I am hacking Don's opencv.sh to try a new build.

dlandon commented 4 years ago

Let me know how you get it to work and I’ll modify the script with some user settings so they can adjust it for their situation.

sic79 commented 4 years ago

@dlandon I have followed your steps and and the cv2 error is now gone :). Now I have another error when I get motion alarm: [ WARN:0] global /io/opencv/modules/dnn/src/dnn.cpp (1363) setUpNet DNN module was not built with CUDA backend; switching to CPU

Can also add that nvidia-smi reports: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ So that part seems to be working great.

dlandon / zoneminder.machine.learning

Add Nvidia GPU Support to Docker #65