davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.54k stars 3.37k forks source link

What will be recommended HW setup for Realtime face Detection ? #1004

Closed MyraBaba closed 6 years ago

MyraBaba commented 6 years ago

Hi,

This could be help full for all of us i assume :)

What will be the best setup (NVIDIA 1080ti or NVIDIA Jetson TX2 . etc ) in perspective of the both performance and price wise ?

Jetson is ARM based and suspecting the possible problems 1080ti could be expensive for my demo project.

I need at least 30 fps realtime face detection + Recognition (multiply faces in the camera).

According to your experience may be have an idea.

Many thanks for the library. I hope some day there is a solid java/scala port :) . I started to learn c++ because of the great DLIB.

Cheers

davisking commented 6 years ago

I haven't used a Jetson myself, so I don't know. But it really comes down you your specific needs. A 1080ti is way better. But it obviously costs more. It's up to you to decide. I would prototype both and see if you can fit your application into the Jetson, if you can then there you go. But if not then you need to spend more money on better hardware.

Also, if you want to use dlib from java you can easily define an interface between java and C++ using the tooling here https://github.com/davisking/dlib/tree/master/dlib/java. There is also a bit more discussion of this here: http://blog.dlib.net/2014/10/mitie-v03-released-now-with-java-and-r.html. The newest version of the java/C++ tooling is in that dlib/java folder.

MyraBaba commented 6 years ago

I will do the both soon and let you know the results here for comparison.

Meanwhile how we can use all available cores and cpus in the server in dblib ? (its only using one core ).

I read about the openblas and Intel MKL (paid) and installed both . I didnt see significant improvement still 1 or 2 core is busy. How I can check that dlib example using the blas or intel mkl ? I am using the Clion by the way (mac os x)

It could be very useful a blog post to explain how to benefit Dlib full power with armed with the full CPUs with openblas etc.

Many thanks..

davisking commented 6 years ago

The output of cmake tells you what it's doing with regards to any BLAS or GPU usage.

MyraBaba commented 6 years ago

Yes I saw below, even it says it found BLAS . still using only one core..

may be a very stupid question but I couldnt find a clear explanation for using full cpu power.

best...

cmake .. -DUSE_AVX_INSTRUCTIONS=1 -- The C compiler identification is AppleClang 9.0.0.9000038 -- The CXX compiler identification is AppleClang 9.0.0.9000038 -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Enabling AVX instructions -- Looking for pthread.h -- Looking for pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - found -- Found Threads: TRUE
-- Looking for png_create_read_struct -- Looking for png_create_read_struct - found -- Looking for jpeg_read_header -- Looking for jpeg_read_header - found -- Searching for BLAS and LAPACK -- Searching for BLAS and LAPACK -- Found PkgConfig: /usr/local/bin/pkg-config (found version "0.29.2") -- Checking for module 'cblas' -- No package 'cblas' found -- Checking for module 'lapack' -- No package 'lapack' found -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of void -- Check size of void - done -- Found LAPACK library -- Found CBLAS library -- Looking for cblas_ddot -- Looking for cblasddot - found -- Looking for sgesv -- Looking for sgesv - found -- Looking for sgesv -- Looking for sgesv_ - found CUDA_TOOLKIT_ROOT_DIR not found or specified -- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (Required is at least version "7.5") -- Disabling CUDA support for dlib. DLIB WILL NOT USE CUDA -- Building a C++11 test project to see if your compiler supports C++11 -- C++11 activated. -- Building a C++11 test project to see if your compiler supports C++11 -- C++11 activated. -- Configuring done -- Generating done

davisking commented 6 years ago

Either the version of BLAS you are using isn't multi-core aware or you aren't using any part of dlib that benefits from it.

MyraBaba commented 6 years ago

I will investigate ,

I am using : facedetection, facelandmark and face recognition part.

Is these benefits from multicore ?

davisking commented 6 years ago

It depends on what exactly you mean. There are multiple face detectors in dlib.

MyraBaba commented 6 years ago

I am using below code:

"Python"

face_locations = face_recognition.face_locations(frame) or face_locations = face_recognition.face_locations(frame, number_of_times_to_upsample=0,model = "cnn")

face_encodings = face_recognition.face_encodings(frame, face_locations)

and the encoding compare...

When profile : %56 of the consumed by :

face_detector= dlib.get_frontal_face_detector()

Is it caused by Python (not allowing the Dlib benefiting from cores that I should switch bare C++ . or BLAS is not multicore aware ?

I have plenty 8 cores and I can only use 1 of them ... (Mac Book Pro 2017)

davisking commented 6 years ago

Don't call get_frontal_face_detector() over and over. Call it once.

Anyway, most of this stuff isn't multicore. Only the DNN stuff is. It's up to you to thread the rest appropriately for your application.

mcourteaux commented 6 years ago

Technical note: if your application allows latency (like 5 frames delay), you don't need to perform face detection on every frame. Just detect faces every 5 frames, and interpolate face positions between them. Just saying, in case this might be something that will do for your scenario.

davisking commented 6 years ago

That's a decent idea as well. But the deeper issue is that you shouldn't be calling code inside your processing loop that doesn't need to be there. Case in point, model loading code like get_frontal_face_detector has no business being called more than once, let alone on every frame.

MyraBaba commented 6 years ago

Another idea came up from my friend that send each frame to different process (multiprocessing ) so we can benefit from the CPU cores.

And we need to keep the frame sequence as it is..

On 12 Dec 2017, at 14:40, Davis E. King notifications@github.com wrote:

That's a decent idea as well. But the deeper issue is that you shouldn't be calling code inside your processing loop that doesn't need to be there. Case in point, model loading code like get_frontal_face_detector has no business being called more than once, let alone on every frame.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/davisking/dlib/issues/1004#issuecomment-351027036, or mute the thread https://github.com/notifications/unsubscribe-auth/AQscnxhCRIQ5P1K3is7mVltDFh34BzGcks5s_mYxgaJpZM4Q8OWV.

xhuvom commented 6 years ago

I have installed dlib with AVX_INSTRUCTIONS and CUDA+cuDNN. But running a real-time facial detector (5 point) from webcam lags about 1~2 sec per frame when opencv capture is used on the code. The code should run smoothly about 30 fps (theoretically) on my GTX1080 GPU but I am confused whether Dlib using the GPU at all. Checking GPU memory while runtime shows only 15Mb consumption. Any idea whats happening?

ariel415el commented 6 years ago

Hi, Any new ideas about how to verify that dlib uses GPU?

davisking commented 6 years ago

CMake tells you if it's going to use cuda when you install it. I also recently added the dlib.DLIB_USE_CUDA variable that you can look at.