Face detect using GPU parallelization

arith3 commented 6 years ago

face_recognition version: Latest
Python version: 2.7
Operating System: Ubuntu 16.04

Description

Hi. When I extract faces on a video file, It takes a lot of time. I use dlib and cuda, I think my system was not perfectly using GPU. It only use 100% of CPUs and GPU-util is 1%. How can I improve inference speed using GPU?

What I Did

Writing frame 2655 / 4820
Writing frame 2656 / 4820
Writing frame 2657 / 4820
Writing frame 2658 / 4820
Writing frame 2659 / 4820
Writing frame 2660 / 4820

CUDA version is 8.0, CUDNN version is 5.1.

ageitgey commented 6 years ago

Did you explicitly compile dlib with CUDA support? It won't use CUDA unless you do that.

You can check if anything is actually using the GPU with nvidia-smi command line tool.

arith3 commented 6 years ago

sure. I use cuda support compile.

Mon Jan 22 10:34:48 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 44%   56C    P8    18W / 250W |    316MiB / 11172MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 73%   84C    P2   111W / 250W |    452MiB / 11172MiB |     46%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:81:00.0 Off |                  N/A |
| 44%   63C    P2   117W / 250W |    452MiB / 11172MiB |     37%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      8746      C   python                                       153MiB |
|    0     12361      C   python                                       153MiB |
|    1      8746      C   python                                       289MiB |
|    1     12361      C   python                                       153MiB |
|    2      8746      C   python                                       153MiB |
|    2     12361      C   python                                       289MiB |
+-----------------------------------------------------------------------------+

It does not properly on GPU support. First, I have to re-install. After that, I will upload results on this page.

FiveMaster commented 6 years ago

@namori3 did you solve this issue? I have this issue too, and I do not know how to solve it. I also explicitly compile dlib with CUDA support, but GPU seems is not used..

arith3 commented 6 years ago

@FiveMaster Hello! I recompiled all dlib files. Now my code just little bit use GPU. However, I did not think this status is not complete solution. This code only use 1 GPU and speed is also slow. I am editing codes, but this job not yet complete. If I complete this problem, I will write down the method on this page.

FiveMaster commented 6 years ago

@namori3 oh, no! I use this comand to compile dlib with CUDA: python setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA -G "Visual Studio 15 2017 Win64" I just check the log and found that my dlib do not use CUDA, log show:

Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0 (found suitable version "8.0", minimum required is "7.5")
-- Looking for cuDNN install...
-- Found cuDNN: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cudnn.lib
-- Building a CUDA test project to see if your compiler is compatible with CUDA...
-- *** CUDA was found but your compiler failed to compile a simple CUDA program so dlib isn't going to use CUDA. ***
-- Disabling CUDA support for dlib.  DLIB WILL NOT USE CUDA

did you have this issue? I think you may check this first.

arith3 commented 6 years ago

@FiveMaster I use Linux for training and inference. So, I think my environment is different to you.

ExtremeYu commented 6 years ago

@FiveMaster 求跟进

FiveMaster commented 6 years ago

@ExtremeYu How do you know that I am a Chinese??:) My operating system is Win10, and I follow this guidence to install face_recognition: Windows Installation Tutorial. You should note that, CUDA do not support VS2017 now, so we need to install VS2015, it is important. And, we should download the newest dlib from this repo, because we can setup the newest dlib without boost, it is very easy.

chapm250 commented 6 years ago

I am also having this problem. Installed on Ubuntu 17.10 with dlib-19.9 using sudo python3 setup.py install -v when it compiles I see Enabling CUDA support for dlib and while running the dlib tests it does use the GPU as seen through nvidia-smi.

AustinFelipe commented 6 years ago

Have you guys news on that?

windforce7 commented 6 years ago

@ageitgey I try to allocate frame processing task to different GPUs alternately to accelerate the whole process. CUDA memory copy error happened. Seems like the detector model importing in api.py is already bounded to device 0 (the default one), while running it on another card would cause memory problem. If you have any idea about this please tell me, thank you. I already have some idea and trying to solve this. I'll keep you updated.

Here's my conclusion: the key is to invoke cuda.set_device() before importing model to dlib. For example:

face_detector = dlib.get_frontal_face_detector()

to

cuda.set_device(1)
face_detector_1 = dlib.get_frontal_face_detector()
cuda.set_device(0)
face_detector_0 = dlib.get_frontal_face_detector()

would copy model to two graphic cards respectively.

ageitgey commented 6 years ago

@windforce7 Are you running multiple processes in parallel?

You might be able to just import face_recognition inside of the function that you run in the separate process (instead of importing it at the top of the program) and call cuda.set_device() in that function. But good point - maybe it's worth having an API to let the user optionally choose a cuda device when they import the library.

windforce7 commented 6 years ago

@ageitgey Yup. I have written my own version of paralleling code to assign dlib tasks to different GPU.

I can publish a modified api.py after some adjustment soon, if you don't mind.

windforce7 commented 6 years ago

@ageitgey Multi-threading turned out a pretty bad idea for assigning tasks to multiple GPUs due to Python's weak multi-threading mechanism. So a practical way to run dlib on many GPUs is to set up one python process for each GPU: call cuda.set_device() to set GPU context when a process is started. The multiprocessing module allows us to start many processes, but I causes more problems than it solves.

adv-ai-tech commented 6 years ago

I also get the same error. I checked Stackkkoverflow. But still couldn't find any solution

I get this


-- The C compiler identification is GNU 6.4.0
-- The CXX compiler identification is GNU 6.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using CMake version: 3.5.1
-- Compiling dlib version: 19.16.99
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so - found
-- Looking for gethostbyname
-- Looking for gethostbyname - found
-- Looking for connect
-- Looking for connect - found
-- Looking for remove
-- Looking for remove - found
-- Looking for shmat
-- Looking for shmat - found
-- Looking for IceConnectionNumber in ICE
-- Looking for IceConnectionNumber in ICE - found
-- Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
-- Searching for BLAS and LAPACK
-- Searching for BLAS and LAPACK
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") 
-- Checking for module 'cblas'
--   No package 'cblas' found
-- Checking for module 'lapack'
--   Found lapack, version 0.2.18
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- Found Intel MKL BLAS/LAPACK library
-- Looking for sgesv
-- Looking for sgesv - found
-- Looking for sgesv_
-- Looking for sgesv_ - found
-- Found CUDA: /usr/local/cuda-8.0 (found suitable version "8.0", minimum required is "7.5") 
-- Looking for cuDNN install...
-- Found cuDNN: /home/ee15s055/miniconda3/envs/py36/lib/libcudnn.so
-- Building a CUDA test project to see if your compiler is compatible with CUDA...
-- *****************************************************************************************************************
-- *** CUDA was found but your compiler failed to compile a simple CUDA program so dlib isn't going to use CUDA. 
-- *** The output of the failed CUDA test compile is shown below: 
-- ***   Change Dir: /home/ee15s055/dlib/dlib/cuda_test_build
   ***   
   ***   Run Build Command:"/usr/bin/make"
   ***   [ 50%] Building NVCC (Device) object CMakeFiles/cuda_test.dir/cuda_test_generated_cuda_test.cu.o
   ***   In file included from /usr/local/cuda-8.0/include/cuda_runtime.h:78:0,
   ***                    from <command-line>:0:
   ***   /usr/local/cuda-8.0/include/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 5 are not supported!
   ***    #error -- unsupported GNU version! gcc versions later than 5 are not supported!
   ***     ^~~~~
   ***   CMake Error at cuda_test_generated_cuda_test.cu.o.cmake:203 (message):
   ***     Error generating
   ***     /home/ee15s055/dlib/dlib/cuda_test_build/CMakeFiles/cuda_test.dir//./cuda_test_generated_cuda_test.cu.o
   ***   
   ***   
   ***   CMakeFiles/cuda_test.dir/build.make:63: recipe for target 'CMakeFiles/cuda_test.dir/cuda_test_generated_cuda_test.cu.o' failed
   ***   make[2]: *** [CMakeFiles/cuda_test.dir/cuda_test_generated_cuda_test.cu.o] Error 1
   ***   CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/cuda_test.dir/all' failed
   ***   make[1]: *** [CMakeFiles/cuda_test.dir/all] Error 2
   ***   Makefile:83: recipe for target 'all' failed
   ***   make: *** [all] Error 2
   ***   
-- *****************************************************************************************************************
-- Disabling CUDA support for dlib.  DLIB WILL NOT USE CUDA
-- C++11 activated.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ee15s055/dlib```

@ageitgey @windforce7

Julius-ZCJ commented 5 years ago

@windforce7 Ｉ also have question about dlib doesn't use gpu, I want to know does CUDA is a lib? how can you call 'cuda.set_device()' function, when I use this function , it tells me: No module named 'cuda'

windforce7 commented 5 years ago

@windforce7 Ｉ also have question about dlib doesn't use gpu, I want to know does CUDA is a lib? how can you call 'cuda.set_device()' function, when I use this function , it tells me: No module named 'cuda'

CUDA is a c++ framework created by NVidia for GPU computing. You don't need cuda installed on your PC to run face_recognition project since it doesn't rely on cuda framework. In this senario, instance of face_recognition program will probably not use GPU. Instead, face_recognition uses dlib package as a intermediate layer between computing hardware and python application. Face_recognition actually calls dlib functions without knowing where those calculation really happen (on CPU or GPU). If cuda is installed, dlib can be compiled with param -DDLIB_USE_CUDA=1 and support GPU computing. Also in this case, dlib would expose some of cuda's interfaces to you, like cuda.set_device() here. Short story: install NVidia CUDA first, compile and install dlib with cuda support param and finally face_recognition.

wahyubram82 commented 4 years ago

I also get the same error. I checked Stackkkoverflow. But still couldn't find any solution

I get this

-- The C compiler identification is GNU 6.4.0
-- The CXX compiler identification is GNU 6.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using CMake version: 3.5.1
-- Compiling dlib version: 19.16.99
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so - found
-- Looking for gethostbyname
-- Looking for gethostbyname - found
-- Looking for connect
-- Looking for connect - found
-- Looking for remove
-- Looking for remove - found
-- Looking for shmat
-- Looking for shmat - found
-- Looking for IceConnectionNumber in ICE
-- Looking for IceConnectionNumber in ICE - found
-- Found X11: /usr/lib/x86_64-linux-gnu/libX11.so
-- Searching for BLAS and LAPACK
-- Searching for BLAS and LAPACK
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") 
-- Checking for module 'cblas'
--   No package 'cblas' found
-- Checking for module 'lapack'
--   Found lapack, version 0.2.18
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- Found Intel MKL BLAS/LAPACK library
-- Looking for sgesv
-- Looking for sgesv - found
-- Looking for sgesv_
-- Looking for sgesv_ - found
-- Found CUDA: /usr/local/cuda-8.0 (found suitable version "8.0", minimum required is "7.5") 
-- Looking for cuDNN install...
-- Found cuDNN: /home/ee15s055/miniconda3/envs/py36/lib/libcudnn.so
-- Building a CUDA test project to see if your compiler is compatible with CUDA...
-- *****************************************************************************************************************
-- *** CUDA was found but your compiler failed to compile a simple CUDA program so dlib isn't going to use CUDA. 
-- *** The output of the failed CUDA test compile is shown below: 
-- ***   Change Dir: /home/ee15s055/dlib/dlib/cuda_test_build
   ***   
   ***   Run Build Command:"/usr/bin/make"
   ***   [ 50%] Building NVCC (Device) object CMakeFiles/cuda_test.dir/cuda_test_generated_cuda_test.cu.o
   ***   In file included from /usr/local/cuda-8.0/include/cuda_runtime.h:78:0,
   ***                    from <command-line>:0:
   ***   /usr/local/cuda-8.0/include/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 5 are not supported!
   ***    #error -- unsupported GNU version! gcc versions later than 5 are not supported!
   ***     ^~~~~
   ***   CMake Error at cuda_test_generated_cuda_test.cu.o.cmake:203 (message):
   ***     Error generating
   ***     /home/ee15s055/dlib/dlib/cuda_test_build/CMakeFiles/cuda_test.dir//./cuda_test_generated_cuda_test.cu.o
   ***   
   ***   
   ***   CMakeFiles/cuda_test.dir/build.make:63: recipe for target 'CMakeFiles/cuda_test.dir/cuda_test_generated_cuda_test.cu.o' failed
   ***   make[2]: *** [CMakeFiles/cuda_test.dir/cuda_test_generated_cuda_test.cu.o] Error 1
   ***   CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/cuda_test.dir/all' failed
   ***   make[1]: *** [CMakeFiles/cuda_test.dir/all] Error 2
   ***   Makefile:83: recipe for target 'all' failed
   ***   make: *** [all] Error 2
   ***   
-- *****************************************************************************************************************
-- Disabling CUDA support for dlib.  DLIB WILL NOT USE CUDA
-- C++11 activated.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ee15s055/dlib```

@ageitgey @windforce7

it's a general problem, to solve:

sudo apt update && sudo apt upgrade
sudo apt install build essential
export CXX=gcc
export CC=g++

after that try to rebuild again

ghost commented 3 years ago

@windforce7 How can I make use of "-DDLIB_USE_CUDA=1". Like the library is working but it is not using the GPU. What should I do so that Dlib uses CUDA ?

ageitgey / face_recognition

Face detect using GPU parallelization #303

Description

What I Did