Closed Dingo64 closed 7 years ago
If you have an nvidia GPU, compiling dlib with CUDA support will speed face recognition up a good bit (but it won't make face detection any faster). Compiling dlib with AVX support (if it isn't already) will also help a little bit.
Using lower resolution images will also speed things up a lot at the cost of missing smaller faces in images. So you can trade off speed for detail.
For example if you are using the command line script, you can change these lines from 1600
to 800
or something smaller. That would make it a lot faster but miss small faces.
So 1 image/sec is okay for pure CPU (possibly without AVX)? It uses a single core right now so it's acceptable. Just wondering if it shouldn't be something like 100 or 1000 images/s.
I'll jump in with a related question. I'm seeing a >2x difference in speed running face_encodings when I run on a new Macbook Pro vs. a Docker container on the same machine (0.2 seconds for 1 face on OSX vs. 0.5 seconds in Docker). I'm running with python 3.6 on both and obviously using the same processor. I also compared the face_locations and both run the same speed (~0.16 seconds). I think this narrows the problem to how dlib creates the encodings, but I'm not sure why that would be different on OSX vs. Linux. Thoughts?
@Dingo64 This might also give you some additional datapoints on expected speed with newish hardware.
The Dockerfile I'm using is the exact same as your example with the exception of using python 3.6 instead of 3.4.
Something in the range of .2s to 1s is pretty normal depending on what your cpu is, image size, etc.
There's lots of ways to improve throughput with more work. For example, you are only using one CPU but you might have 4 or 8 cores to use. So you could run multiple processes in parallel and get a 4-8x throughput boost.
But in general you shouldn't expect 100 or 1000 images per second.
@giffordw Are you running the Docker For Mac as the docker host? Docker for Mac has various performance issues (see https://github.com/docker/for-mac/issues/668). Not saying that's definitely the cause of your issue, but it might be useful to see if running linux as the docker host os performs better on the same / similar hardware.
The other possibility is that maybe dlib has different build options between macOS and Linux. Maybe dlib in the docker image isn't finding a good BLAS library or something? It might be worth looking at the dlib configure / build log you see while building the Docker container to see if anything looks suspicious.
Thanks. It doesn't appear to be an issue with Docker for Mac and rather the dlib build. I'll probably copy this question over on the dlib boards since it is likely an issue with that setup. Figured I would check here first if it's a known issue.
@giffordw Your MacBook Pro has "2.7 GHz quad-core Intel Core i7 Skylake (6820HQ), up to 3.6 GHz" as a CPU? And it uses all your cores?
@Dingo64 Yes, I have a 2.7GHz but the specs are only utilizing 1 core I believe.
@giffordw Thanks. Mine i7 is 2.3 GHz. Sure it's less but still you have 0.16 s and I have something close to 1 s. Shouldn't be that big difference.
running it across multiple cores does improve performance relative to the cores available. thanks for the pointer :)
Hi, I'm using VMware with ubuntu 16.04 but it is too slow for scanning 512 pictures in the database and it takes about 0.5 seconds for a 640x480 picture. I use dlib compiled with CUDA but I think it can get much faster. Currently, my processor is an Intel Core i3 with GTX1060 GPU. I'm thinking of buying a Core i7 and using pure Ubuntu, Do you think it will get better by doing that? Can I get twice performance? Thanks.
Are you sue it's actually able to access your GPU and use CUDA from inside the VM? Did you confirm via the nvidia-smi
command that dlib is actually using your GPU while the process is running?
I don't know, how can I realize that? I tried to install Nvidia drivers but it stocked on the login screen and as I've searched it seems that VMware does not support CUDA, this reference Link. If I install Ubuntu as native install should I get a huge boost or not? Thanks.
it sounds like you aren't actually using your GPU at all right now. So if you install Ubuntu native and properly set up CUDA so you are using the NVIDIA GPU, you should get a speed boost. But I can't make any promises as to what exact speed you'll get. You'd just have to try it and see.
Thanks I will try and report it back. What about Cpu? If I upgrade it to a core i7 does it get huge better or small thing? Currently the program uses only 40% of Cpu but I'm not sure, if it is related to Cpu.
There's too many variables for me to tell you how much a new CPU would help in your specific case. For example, if your current CPU has 4 cores but your are only using one of them, you could make processing 512 images up to 4x faster by splitting work across 4 processes. Or depending on your application, maybe there's a better way to store the processed data so you don't have to process so many images more than once.
So how could I set the processing to use all my 4 cores? I think the sample using only one core which makes CPU usage less than usual and reduce the performance.
I have used this and does improve performance by the number of cores available and the method of storage and data access also improves the time without deploying more advanced hardware. great tip btw!
On Sat, Jun 24, 2017 at 12:11 AM, Adam Geitgey notifications@github.com wrote:
There's too many variables for me to tell you how much a new CPU would help in your specific case. For example, if your current CPU has 4 cores but your are only using one of them, you could make processing 512 images up to 4x faster by splitting work across 4 processes. Or depending on your application, maybe there's a better way to store the processed data so you don't have to process so many images more than once.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ageitgey/face_recognition/issues/98#issuecomment-310773989, or mute the thread https://github.com/notifications/unsubscribe-auth/ADTwe7yS7QNU5Ho1REkVWYdJXNo4WitBks5sHCn9gaJpZM4NyADf .
@masoudr I added multi-core support to the command line tool - https://github.com/ageitgey/face_recognition/commit/245160c2f6c741ac7cea6b5fb14b54b8d3cd07a0#diff-db0cd43e07ede47e67fd9317683e62a0
With the latest code in Github, you can run face_recognition -cpus 4 ./pictures_of_people_i_know/ ./unknown_pictures/
to use four cpus with the command line program.
@ageitgey Thanks. I tried to run my example with python web_service_fin.py --cpus 4
but still it uses only 40 percent of my CPU. I used the new version with python setup.py install
with Windows. In my Ubuntu VM it goes faster but it is like the old version. I tried the face_recognition --cpus 4 ./pictures_of_people_i_know/ ./unknown_pictures/
and it goes OK with full CPU usage but with my python code I think it has a problem.
Edit:
I used Multiprocessing
with Pool()
and got a huge boost. Temporary I solved my problem.
Please add multiprocessing support for python as well
Please add support for multiprocessing for face detection too
@jhcruvinel Multiprocessing is a feature of Python itself. It's not limited to this library. You can look at that diff I linked above for an example of how you might use it in your own programs.
@ageitgey, thanks for the tip. I ended up implementing in python 2.7 which also supports multiprocessing
@jhcruvinel Cool. The issue with Python 2.7 was that multiprocessing might crash on macOS due to a bug in macOS and how Python spawns processes. You can work around it using the forkserver
option which they only added in Python 3.4.
But if you are using Windows or Linux, Python 2.7 probably would work just fine.
OK @ageitgey, it is right. Thanks a lot.
Hi!, what Nvidia GPU do you recomend to get the best performance?. Thanks!
A GeForce GTX 1080 Ti would work well.
Thanks @ageitgey ! More DDR5 RAM will be better?
It let's you run larger models or process bigger batches of images. So more GPU memory is a good investment in general for deep learning. It's the upper bound of what you can run.
Thanks!!!
How can I enable it to run face detection with 2 GPUs or more? as far from my test, it able to run with on single GPU only.
@redm0n As I know what becomes more time consuming is generating an encoded data for every image. You just need to use multiprocessing
module to handle each image file per one CPU core. That is all you need. And if you need to compare two images faster you must use this method again.
@ageitgey Thanks for your work ,and it really helps me in my job.Could you please tell is there any other flags to add when compiling dlib in order to make the program faster except the neon flag and the multiprocessing.
Hi!, what is the recomended size for the images in face recognition? Thanks!!!
@neumartin It depends on how small the face will be inside the image.
You want to try to make sure each face image is at least, say, 100x100 pixels when you extract it from the larger image, maybe a little bigger.
@ageitgey I have been trying to do GPU acceleration on my hp laptop using Linux but unable to obtain it. Can you please guide me through the process!!
@chetu1988 did you compile dlib with CUDA support?, have you an Nvidia with CUDA?. I thing also you need to use the CNN model, like: face_locations = face_recognition.face_locations(image, model="cnn")
Thanks for your reply @neumartin I have Nvidia with CUDA and Compiled dlib with CUDA support. Let me confirm once, Compiling dlib with CUDA support means while running "cmake .. -DDLIB_USE_CUDA=0 -DUSE_AVX_INSTRUCTIONS=1" command, I have to mention CUDA = 4, and in " python3 setup.py install --yes USE_AVX_INSTRUCTIONS --no DLIB_USE_CUDA " command, I have to mention "--yes DLIB_USE_CUDA" Right?
@chetu1988 try cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1; cmake --build . and python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA
After running cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1 command, it showing like DLIB WILL NOT USE CUDA.
mj@mj:~/dlib/build$ cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1 -- Using CMake version: 3.5.1 -- Enabling AVX instructions -- Looking for cuDNN install... -- cuDNN V5.0 OR GREATER NOT FOUND. -- Dlib requires cuDNN V5.0 OR GREATER. Since cuDNN is not found DLIB WILL NOT USE CUDA. -- If you have cuDNN then set CMAKE_PREFIX_PATH to include cuDNN's folder. -- Disabling CUDA support for dlib. DLIB WILL NOT USE CUDA -- C++11 activated. -- Configuring done -- Generating done -- Build files have been written to: /home/mj/dlib/build
I don't know... may be the nvidia drivers version.
Thanks for quick reply @neumartin :)
I have cuda 9.1 and i have installed nvidia driver,
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
I think i have to install cuDNN it seems.
Yes, try this: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/
Thanks for the link, I will check and update you.
I have successfully installed cuDNN now it is showing like this. ~/dlib/build$ cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1 -- Using CMake version: 3.5.1 -- Enabling AVX instructions -- Looking for cuDNN install... -- Found cuDNN: /usr/local/cuda-9.1/lib64/libcudnn.so -- Building a CUDA test project to see if your compiler is compatible with CUDA... -- Checking if you have the right version of cuDNN installed. -- Enabling CUDA support for dlib. DLIB WILL USE CUDA -- C++11 activated. -- Configuring done -- Generating done -- Build files have been written to: /home/mj/dlib/build
but after running python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA
Invoking CMake setup: 'cmake /home/mj/dlib/tools/python -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/mj/dlib/build/lib.linux-x86_64-3.5 -DPYTHON_EXECUTABLE=/usr/bin/python3 -DUSE_AVX_INSTRUCTIONS=yes -DDLIB_USE_CUDA=yes -DCMAKE_BUILD_TYPE=Release' -- pybind11 v2.2.2 -- Using CMake version: 3.5.1 -- Enabling AVX instructions -- Searching for BLAS and LAPACK -- Searching for BLAS and LAPACK -- Checking for module 'cblas' -- No package 'cblas' found -- Found OpenBLAS library -- Using OpenBLAS's built in LAPACK -- Looking for cuDNN install... -- cuDNN V5.0 OR GREATER NOT FOUND. -- Dlib requires cuDNN V5.0 OR GREATER. Since cuDNN is not found DLIB WILL NOT USE CUDA. -- If you have cuDNN then set CMAKE_PREFIX_PATH to include cuDNN's folder. -- Disabling CUDA support for dlib. DLIB WILL NOT USE CUDA -- C++11 activated. -- Found Python with installed numpy package -- Configuring done -- Generating done -- Build files have been written to: /home/mj/dlib/build/temp.linux-x86_64-3.5 Invoking CMake build: 'cmake --build . --config Release -- -j2' [ 85%] Built target dlib [100%] Built target dlib_python
Please let me know if you have any idea how to set CMAKE_PREFIX_PATH to include cuDNN's folder
See this http://blog.dlib.net/2017/08/vehicle-detection-with-dlib-195_27.html But really I have no idea, may be the CUDA version you use is too new.
@chetu1988 It seems that Dlib and Cuda9.1 has some compatibility issues! Check this stackoverflow question by me: https://stackoverflow.com/questions/49841147/why-i-get-dlib-isnt-going-to-use-cuda-when-i-compile-dlib-python-interface?noredirect=1#comment86701117_49841147
Davis, the maker of Dlib couldn't help me out! Finally I decided to switch to Cuda 9.0 and it works! So my advice to you would be: remove Cuda9.1 and install Cuda9.0. Even Tensorflow has issues with Cuda9.1 so to spare you headache, just install Cuda9.0
I downgraded to Cuda 9.0 from 9.1, while installing it shows as "Removing _Nvidia-396(It is latest driver)" and installing nvidia-390. May Nvidia-396 driver will not support cuda-9.1 it seems and also I cannot switch Intel graphic card to Nvidia. I tried the this link (https://www.linuxbabe.com/desktop-linux/switch-intel-nvidia-graphics-card-ubuntu) to switch graphic card. Any other method, Please help me.
Once after Downgrading to Cuda 9.0. After Running this command "sudo python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA"
I got error like this.
running install running bdist_egg running egg_info writing dlib.egg-info/PKG-INFO writing top-level names to dlib.egg-info/top_level.txt writing dependency_links to dlib.egg-info/dependency_links.txt package init file 'dlib/init.py' not found (or not a regular file) reading manifest file 'dlib.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'dlib.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_py running build_ext Invoking CMake setup: 'cmake /home/mj/dlib/tools/python -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/mj/dlib/build/lib.linux-x86_64-3.5 -DPYTHON_EXECUTABLE=/usr/bin/python3 -DUSE_AVX_INSTRUCTIONS=yes -DDLIB_USE_CUDA=yes -DCMAKE_BUILD_TYPE=Release' -- pybind11 v2.2.2 -- Using CMake version: 3.5.1 -- Enabling AVX instructions -- Searching for BLAS and LAPACK -- Searching for BLAS and LAPACK -- Checking for module 'cblas' -- No package 'cblas' found -- Found OpenBLAS library -- Using OpenBLAS's built in LAPACK -- Looking for cuDNN install... -- Found cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so -- Enabling CUDA support for dlib. DLIB WILL USE CUDA -- C++11 activated. -- Found Python with installed numpy package -- Configuring done -- Generating done -- Build files have been written to: /home/mj/dlib/build/temp.linux-x86_64-3.5 Invoking CMake build: 'cmake --build . --config Release -- -j2' [ 1%] Building NVCC (Device) object dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cuda_dlib.cu.o [ 1%] Building NVCC (Device) object dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cusolver_dlibapi.cu.o CMake Error at dlib_generated_cusolver_dlibapi.cu.o.cmake:207 (message): Error generating /home/mj/dlib/build/temp.linux-x86_64-3.5/dlib_build/CMakeFiles/dlib.dir/cuda/./dlib_generated_cusolver_dlibapi.cu.o
CMake Error at dlib_generated_cuda_dlib.cu.o.cmake:207 (message): Error generating /home/mj/dlib/build/temp.linux-x86_64-3.5/dlib_build/CMakeFiles/dlib.dir/cuda/./dlib_generated_cuda_dlib.cu.o
dlib_build/CMakeFiles/dlib.dir/build.make:70: recipe for target 'dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cusolver_dlibapi.cu.o' failed
make[2]: [dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cusolver_dlibapi.cu.o] Error 1
make[2]: Waiting for unfinished jobs....
dlib_build/CMakeFiles/dlib.dir/build.make:63: recipe for target 'dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cuda_dlib.cu.o' failed
make[2]: [dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cuda_dlib.cu.o] Error 1
CMakeFiles/Makefile2:140: recipe for target 'dlib_build/CMakeFiles/dlib.dir/all' failed
make[1]: [dlib_build/CMakeFiles/dlib.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: * [all] Error 2
Traceback (most recent call last):
File "setup.py", line 249, in
What might be the issue? Please let me know the solution.
First of all- this is awesome! Much better than OpenCV. No false positives so far while OpenCV had many. However it does about 1 image per second on my my Core i7. And I am just using face detection from this python library (no recognition). Is this a normal result? Feels a tad slow.