What is the expected speed per image?

Dingo64 commented 7 years ago

First of all- this is awesome! Much better than OpenCV. No false positives so far while OpenCV had many. However it does about 1 image per second on my my Core i7. And I am just using face detection from this python library (no recognition). Is this a normal result? Feels a tad slow.

ageitgey commented 7 years ago

If you have an nvidia GPU, compiling dlib with CUDA support will speed face recognition up a good bit (but it won't make face detection any faster). Compiling dlib with AVX support (if it isn't already) will also help a little bit.

Using lower resolution images will also speed things up a lot at the cost of missing smaller faces in images. So you can trade off speed for detail.

For example if you are using the command line script, you can change these lines from 1600 to 800 or something smaller. That would make it a lot faster but miss small faces.

Dingo64 commented 7 years ago

So 1 image/sec is okay for pure CPU (possibly without AVX)? It uses a single core right now so it's acceptable. Just wondering if it shouldn't be something like 100 or 1000 images/s.

giffordw commented 7 years ago

I'll jump in with a related question. I'm seeing a >2x difference in speed running face_encodings when I run on a new Macbook Pro vs. a Docker container on the same machine (0.2 seconds for 1 face on OSX vs. 0.5 seconds in Docker). I'm running with python 3.6 on both and obviously using the same processor. I also compared the face_locations and both run the same speed (~0.16 seconds). I think this narrows the problem to how dlib creates the encodings, but I'm not sure why that would be different on OSX vs. Linux. Thoughts?

@Dingo64 This might also give you some additional datapoints on expected speed with newish hardware.

The Dockerfile I'm using is the exact same as your example with the exception of using python 3.6 instead of 3.4.

ageitgey commented 7 years ago

Something in the range of .2s to 1s is pretty normal depending on what your cpu is, image size, etc.

There's lots of ways to improve throughput with more work. For example, you are only using one CPU but you might have 4 or 8 cores to use. So you could run multiple processes in parallel and get a 4-8x throughput boost.

But in general you shouldn't expect 100 or 1000 images per second.

ageitgey commented 7 years ago

@giffordw Are you running the Docker For Mac as the docker host? Docker for Mac has various performance issues (see https://github.com/docker/for-mac/issues/668). Not saying that's definitely the cause of your issue, but it might be useful to see if running linux as the docker host os performs better on the same / similar hardware.

The other possibility is that maybe dlib has different build options between macOS and Linux. Maybe dlib in the docker image isn't finding a good BLAS library or something? It might be worth looking at the dlib configure / build log you see while building the Docker container to see if anything looks suspicious.

giffordw commented 7 years ago

Thanks. It doesn't appear to be an issue with Docker for Mac and rather the dlib build. I'll probably copy this question over on the dlib boards since it is likely an issue with that setup. Figured I would check here first if it's a known issue.

Dingo64 commented 7 years ago

@giffordw Your MacBook Pro has "2.7 GHz quad-core Intel Core i7 Skylake (6820HQ), up to 3.6 GHz" as a CPU? And it uses all your cores?

giffordw commented 7 years ago

@Dingo64 Yes, I have a 2.7GHz but the specs are only utilizing 1 core I believe.

Dingo64 commented 7 years ago

@giffordw Thanks. Mine i7 is 2.3 GHz. Sure it's less but still you have 0.16 s and I have something close to 1 s. Shouldn't be that big difference.

ghost commented 7 years ago

running it across multiple cores does improve performance relative to the cores available. thanks for the pointer :)

masoudr commented 7 years ago

Hi, I'm using VMware with ubuntu 16.04 but it is too slow for scanning 512 pictures in the database and it takes about 0.5 seconds for a 640x480 picture. I use dlib compiled with CUDA but I think it can get much faster. Currently, my processor is an Intel Core i3 with GTX1060 GPU. I'm thinking of buying a Core i7 and using pure Ubuntu, Do you think it will get better by doing that? Can I get twice performance? Thanks.

ageitgey commented 7 years ago

Are you sue it's actually able to access your GPU and use CUDA from inside the VM? Did you confirm via the nvidia-smi command that dlib is actually using your GPU while the process is running?

masoudr commented 7 years ago

I don't know, how can I realize that? I tried to install Nvidia drivers but it stocked on the login screen and as I've searched it seems that VMware does not support CUDA, this reference Link. If I install Ubuntu as native install should I get a huge boost or not? Thanks.

ageitgey commented 7 years ago

it sounds like you aren't actually using your GPU at all right now. So if you install Ubuntu native and properly set up CUDA so you are using the NVIDIA GPU, you should get a speed boost. But I can't make any promises as to what exact speed you'll get. You'd just have to try it and see.

masoudr commented 7 years ago

Thanks I will try and report it back. What about Cpu? If I upgrade it to a core i7 does it get huge better or small thing? Currently the program uses only 40% of Cpu but I'm not sure, if it is related to Cpu.

ageitgey commented 7 years ago

There's too many variables for me to tell you how much a new CPU would help in your specific case. For example, if your current CPU has 4 cores but your are only using one of them, you could make processing 512 images up to 4x faster by splitting work across 4 processes. Or depending on your application, maybe there's a better way to store the processed data so you don't have to process so many images more than once.

masoudr commented 7 years ago

So how could I set the processing to use all my 4 cores? I think the sample using only one core which makes CPU usage less than usual and reduce the performance.

ghost commented 7 years ago

I have used this and does improve performance by the number of cores available and the method of storage and data access also improves the time without deploying more advanced hardware. great tip btw!

On Sat, Jun 24, 2017 at 12:11 AM, Adam Geitgey notifications@github.com wrote:

There's too many variables for me to tell you how much a new CPU would help in your specific case. For example, if your current CPU has 4 cores but your are only using one of them, you could make processing 512 images up to 4x faster by splitting work across 4 processes. Or depending on your application, maybe there's a better way to store the processed data so you don't have to process so many images more than once.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ageitgey/face_recognition/issues/98#issuecomment-310773989, or mute the thread https://github.com/notifications/unsubscribe-auth/ADTwe7yS7QNU5Ho1REkVWYdJXNo4WitBks5sHCn9gaJpZM4NyADf .

ageitgey commented 7 years ago

@masoudr I added multi-core support to the command line tool - https://github.com/ageitgey/face_recognition/commit/245160c2f6c741ac7cea6b5fb14b54b8d3cd07a0#diff-db0cd43e07ede47e67fd9317683e62a0

With the latest code in Github, you can run face_recognition -cpus 4 ./pictures_of_people_i_know/ ./unknown_pictures/ to use four cpus with the command line program.

masoudr commented 7 years ago

@ageitgey Thanks. I tried to run my example with python web_service_fin.py --cpus 4 but still it uses only 40 percent of my CPU. I used the new version with python setup.py install with Windows. In my Ubuntu VM it goes faster but it is like the old version. I tried the face_recognition --cpus 4 ./pictures_of_people_i_know/ ./unknown_pictures/ and it goes OK with full CPU usage but with my python code I think it has a problem. Edit: I used Multiprocessingwith Pool() and got a huge boost. Temporary I solved my problem.

jhcruvinel commented 7 years ago

Please add multiprocessing support for python as well

jhcruvinel commented 7 years ago

Please add support for multiprocessing for face detection too

ageitgey commented 7 years ago

@jhcruvinel Multiprocessing is a feature of Python itself. It's not limited to this library. You can look at that diff I linked above for an example of how you might use it in your own programs.

jhcruvinel commented 7 years ago

@ageitgey, thanks for the tip. I ended up implementing in python 2.7 which also supports multiprocessing

ageitgey commented 7 years ago

@jhcruvinel Cool. The issue with Python 2.7 was that multiprocessing might crash on macOS due to a bug in macOS and how Python spawns processes. You can work around it using the forkserver option which they only added in Python 3.4.

But if you are using Windows or Linux, Python 2.7 probably would work just fine.

jhcruvinel commented 7 years ago

OK @ageitgey, it is right. Thanks a lot.

neumartin commented 6 years ago

Hi!, what Nvidia GPU do you recomend to get the best performance?. Thanks!

ageitgey commented 6 years ago

A GeForce GTX 1080 Ti would work well.

neumartin commented 6 years ago

Thanks @ageitgey ! More DDR5 RAM will be better?

ageitgey commented 6 years ago

It let's you run larger models or process bigger batches of images. So more GPU memory is a good investment in general for deep learning. It's the upper bound of what you can run.

neumartin commented 6 years ago

Thanks!!!

redm0n commented 6 years ago

How can I enable it to run face detection with 2 GPUs or more? as far from my test, it able to run with on single GPU only.

masoudr commented 6 years ago

@redm0n As I know what becomes more time consuming is generating an encoded data for every image. You just need to use multiprocessing module to handle each image file per one CPU core. That is all you need. And if you need to compare two images faster you must use this method again.

xbcReal commented 6 years ago

@ageitgey Thanks for your work ,and it really helps me in my job.Could you please tell is there any other flags to add when compiling dlib in order to make the program faster except the neon flag and the multiprocessing.

neumartin commented 6 years ago

Hi!, what is the recomended size for the images in face recognition? Thanks!!!

ageitgey commented 6 years ago

@neumartin It depends on how small the face will be inside the image.

You want to try to make sure each face image is at least, say, 100x100 pixels when you extract it from the larger image, maybe a little bigger.

chetu1988 commented 6 years ago

@ageitgey I have been trying to do GPU acceleration on my hp laptop using Linux but unable to obtain it. Can you please guide me through the process!!

neumartin commented 6 years ago

@chetu1988 did you compile dlib with CUDA support?, have you an Nvidia with CUDA?. I thing also you need to use the CNN model, like: face_locations = face_recognition.face_locations(image, model="cnn")

chetu1988 commented 6 years ago

Thanks for your reply @neumartin I have Nvidia with CUDA and Compiled dlib with CUDA support. Let me confirm once, Compiling dlib with CUDA support means while running "cmake .. -DDLIB_USE_CUDA=0 -DUSE_AVX_INSTRUCTIONS=1" command, I have to mention CUDA = 4, and in " python3 setup.py install --yes USE_AVX_INSTRUCTIONS --no DLIB_USE_CUDA " command, I have to mention "--yes DLIB_USE_CUDA" Right?

neumartin commented 6 years ago

@chetu1988 try cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1; cmake --build . and python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA

chetu1988 commented 6 years ago

After running cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1 command, it showing like DLIB WILL NOT USE CUDA.

mj@mj:~/dlib/build$ cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1 -- Using CMake version: 3.5.1 -- Enabling AVX instructions -- Looking for cuDNN install... -- cuDNN V5.0 OR GREATER NOT FOUND. -- Dlib requires cuDNN V5.0 OR GREATER. Since cuDNN is not found DLIB WILL NOT USE CUDA. -- If you have cuDNN then set CMAKE_PREFIX_PATH to include cuDNN's folder. -- Disabling CUDA support for dlib. DLIB WILL NOT USE CUDA -- C++11 activated. -- Configuring done -- Generating done -- Build files have been written to: /home/mj/dlib/build

neumartin commented 6 years ago

I don't know... may be the nvidia drivers version.

chetu1988 commented 6 years ago

Thanks for quick reply @neumartin :) I have cuda 9.1 and i have installed nvidia driver,
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85

I think i have to install cuDNN it seems.

neumartin commented 6 years ago

Yes, try this: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/

chetu1988 commented 6 years ago

Thanks for the link, I will check and update you.

chetu1988 commented 6 years ago

I have successfully installed cuDNN now it is showing like this. ~/dlib/build$ cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1 -- Using CMake version: 3.5.1 -- Enabling AVX instructions -- Looking for cuDNN install... -- Found cuDNN: /usr/local/cuda-9.1/lib64/libcudnn.so -- Building a CUDA test project to see if your compiler is compatible with CUDA... -- Checking if you have the right version of cuDNN installed. -- Enabling CUDA support for dlib. DLIB WILL USE CUDA -- C++11 activated. -- Configuring done -- Generating done -- Build files have been written to: /home/mj/dlib/build

but after running python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA

Invoking CMake setup: 'cmake /home/mj/dlib/tools/python -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/mj/dlib/build/lib.linux-x86_64-3.5 -DPYTHON_EXECUTABLE=/usr/bin/python3 -DUSE_AVX_INSTRUCTIONS=yes -DDLIB_USE_CUDA=yes -DCMAKE_BUILD_TYPE=Release' -- pybind11 v2.2.2 -- Using CMake version: 3.5.1 -- Enabling AVX instructions -- Searching for BLAS and LAPACK -- Searching for BLAS and LAPACK -- Checking for module 'cblas' -- No package 'cblas' found -- Found OpenBLAS library -- Using OpenBLAS's built in LAPACK -- Looking for cuDNN install... -- cuDNN V5.0 OR GREATER NOT FOUND. -- Dlib requires cuDNN V5.0 OR GREATER. Since cuDNN is not found DLIB WILL NOT USE CUDA. -- If you have cuDNN then set CMAKE_PREFIX_PATH to include cuDNN's folder. -- Disabling CUDA support for dlib. DLIB WILL NOT USE CUDA -- C++11 activated. -- Found Python with installed numpy package -- Configuring done -- Generating done -- Build files have been written to: /home/mj/dlib/build/temp.linux-x86_64-3.5 Invoking CMake build: 'cmake --build . --config Release -- -j2' [ 85%] Built target dlib [100%] Built target dlib_python

Please let me know if you have any idea how to set CMAKE_PREFIX_PATH to include cuDNN's folder

neumartin commented 6 years ago

See this http://blog.dlib.net/2017/08/vehicle-detection-with-dlib-195_27.html But really I have no idea, may be the CUDA version you use is too new.

peshmerge commented 6 years ago

@chetu1988 It seems that Dlib and Cuda9.1 has some compatibility issues! Check this stackoverflow question by me: https://stackoverflow.com/questions/49841147/why-i-get-dlib-isnt-going-to-use-cuda-when-i-compile-dlib-python-interface?noredirect=1#comment86701117_49841147

Davis, the maker of Dlib couldn't help me out! Finally I decided to switch to Cuda 9.0 and it works! So my advice to you would be: remove Cuda9.1 and install Cuda9.0. Even Tensorflow has issues with Cuda9.1 so to spare you headache, just install Cuda9.0

chetu1988 commented 6 years ago

I downgraded to Cuda 9.0 from 9.1, while installing it shows as "Removing _Nvidia-396(It is latest driver)" and installing nvidia-390. May Nvidia-396 driver will not support cuda-9.1 it seems and also I cannot switch Intel graphic card to Nvidia. I tried the this link (https://www.linuxbabe.com/desktop-linux/switch-intel-nvidia-graphics-card-ubuntu) to switch graphic card. Any other method, Please help me.

chetu1988 commented 6 years ago

Once after Downgrading to Cuda 9.0. After Running this command "sudo python3 setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA"

I got error like this.

running install running bdist_egg running egg_info writing dlib.egg-info/PKG-INFO writing top-level names to dlib.egg-info/top_level.txt writing dependency_links to dlib.egg-info/dependency_links.txt package init file 'dlib/init.py' not found (or not a regular file) reading manifest file 'dlib.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'dlib.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_py running build_ext Invoking CMake setup: 'cmake /home/mj/dlib/tools/python -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/mj/dlib/build/lib.linux-x86_64-3.5 -DPYTHON_EXECUTABLE=/usr/bin/python3 -DUSE_AVX_INSTRUCTIONS=yes -DDLIB_USE_CUDA=yes -DCMAKE_BUILD_TYPE=Release' -- pybind11 v2.2.2 -- Using CMake version: 3.5.1 -- Enabling AVX instructions -- Searching for BLAS and LAPACK -- Searching for BLAS and LAPACK -- Checking for module 'cblas' -- No package 'cblas' found -- Found OpenBLAS library -- Using OpenBLAS's built in LAPACK -- Looking for cuDNN install... -- Found cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so -- Enabling CUDA support for dlib. DLIB WILL USE CUDA -- C++11 activated. -- Found Python with installed numpy package -- Configuring done -- Generating done -- Build files have been written to: /home/mj/dlib/build/temp.linux-x86_64-3.5 Invoking CMake build: 'cmake --build . --config Release -- -j2' [ 1%] Building NVCC (Device) object dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cuda_dlib.cu.o [ 1%] Building NVCC (Device) object dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cusolver_dlibapi.cu.o CMake Error at dlib_generated_cusolver_dlibapi.cu.o.cmake:207 (message): Error generating /home/mj/dlib/build/temp.linux-x86_64-3.5/dlib_build/CMakeFiles/dlib.dir/cuda/./dlib_generated_cusolver_dlibapi.cu.o

CMake Error at dlib_generated_cuda_dlib.cu.o.cmake:207 (message): Error generating /home/mj/dlib/build/temp.linux-x86_64-3.5/dlib_build/CMakeFiles/dlib.dir/cuda/./dlib_generated_cuda_dlib.cu.o

dlib_build/CMakeFiles/dlib.dir/build.make:70: recipe for target 'dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cusolver_dlibapi.cu.o' failed make[2]: [dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cusolver_dlibapi.cu.o] Error 1 make[2]: Waiting for unfinished jobs.... dlib_build/CMakeFiles/dlib.dir/build.make:63: recipe for target 'dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cuda_dlib.cu.o' failed make[2]: [dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cuda_dlib.cu.o] Error 1 CMakeFiles/Makefile2:140: recipe for target 'dlib_build/CMakeFiles/dlib.dir/all' failed make[1]: [dlib_build/CMakeFiles/dlib.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: * [all] Error 2 Traceback (most recent call last): File "setup.py", line 249, in 'Topic :: Software Development', File "/home/mj/.local/lib/python3.5/site-packages/setuptools/init.py", line 129, in setup return distutils.core.setup(attrs) File "/usr/lib/python3.5/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands self.run_command(cmd) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/home/mj/.local/lib/python3.5/site-packages/setuptools/command/install.py", line 67, in run self.do_egg_install() File "/home/mj/.local/lib/python3.5/site-packages/setuptools/command/install.py", line 109, in do_egg_install self.run_command('bdist_egg') File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/home/mj/.local/lib/python3.5/site-packages/setuptools/command/bdist_egg.py", line 172, in run cmd = self.call_command('install_lib', warn_dir=0) File "/home/mj/.local/lib/python3.5/site-packages/setuptools/command/bdist_egg.py", line 158, in call_command self.run_command(cmdname) File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/home/mj/.local/lib/python3.5/site-packages/setuptools/command/install_lib.py", line 11, in run self.build() File "/usr/lib/python3.5/distutils/command/install_lib.py", line 109, in build self.run_command('build_ext') File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "setup.py", line 129, in run self.build_extension(ext) File "setup.py", line 165, in build_extension subprocess.check_call(cmake_build, cwd=build_folder) File "/usr/lib/python3.5/subprocess.py", line 581, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-j2']' returned non-zero exit status 2

What might be the issue? Please let me know the solution.

ageitgey / face_recognition

What is the expected speed per image? #98