Hi,
Firstly, thank you for you library. It is very usefull.
Expected Behavior
I would like to do a face detection in real team.
So I try to use Dlib with CUDA.
I compile Dlib with CUDA and it works fine. I reduce the the research of face caracteristics by 100
This operation takes 5ms now :
std::vector<matrix<float,0,1>> faceDescriptor = net(facesfiltered);
So I am suprised because "get_frontal_face_detector() " operation time is not reduce by CUDA. With the same picture, it takes 500ms without CUDA and 500ms with CUDA
Current Behavior
I try to find face face on a picture (1280/720px) with get_frontal_face_detector()
With Dlib and cuda : 543ms
With Dlib without cuda : 542 ms
CUDA not have effect on get_frontal_face_detector()
Steps to Reproduce
1 - I compile dlib19.14 without CUDA
2 - I try find faces :
#include <QCoreApplication>
#include <QTime>
#include <qDebug>
#include <dlib/dnn.h>
#include <dlib/gui_widgets.h>
#include <dlib/clustering.h>
#include <dlib/string.h>
#include <dlib/image_io.h>
#include <dlib/image_processing/frontal_face_detector.h>
using namespace dlib;
using namespace std;
template <template <int,template<typename>class,int,typename> class block, int N, template<typename>class BN, typename SUBNET>
using residual = add_prev1<block<N,BN,1,tag1<SUBNET>>>;
template <template <int,template<typename>class,int,typename> class block, int N, template<typename>class BN, typename SUBNET>
using residual_down = add_prev2<avg_pool<2,2,2,2,skip1<tag2<block<N,BN,2,tag1<SUBNET>>>>>>;
template <int N, template <typename> class BN, int stride, typename SUBNET>
using block = BN<con<N,3,3,1,1,relu<BN<con<N,3,3,stride,stride,SUBNET>>>>>;
template <int N, typename SUBNET> using ares = relu<residual<block,N,affine,SUBNET>>;
template <int N, typename SUBNET> using ares_down = relu<residual_down<block,N,affine,SUBNET>>;
template <typename SUBNET> using alevel0 = ares_down<256,SUBNET>;
template <typename SUBNET> using alevel1 = ares<256,ares<256,ares_down<256,SUBNET>>>;
template <typename SUBNET> using alevel2 = ares<128,ares<128,ares_down<128,SUBNET>>>;
template <typename SUBNET> using alevel3 = ares<64,ares<64,ares<64,ares_down<64,SUBNET>>>>;
template <typename SUBNET> using alevel4 = ares<32,ares<32,ares<32,SUBNET>>>;
using anet_type = loss_metric<fc_no_bias<128,avg_pool_everything<
alevel0<
alevel1<
alevel2<
alevel3<
alevel4<
max_pool<3,3,2,2,relu<affine<con<32,7,7,2,2,
input_rgb_image_sized<150>
>>>>>>>>>>>>;
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
frontal_face_detector detector = get_frontal_face_detector();
shape_predictor sp;
deserialize("C:/Users/jonas.gaudin/Documents/QTProject/DlibSpeedTest/shape_predictor_68_face_landmarks.dat") >> sp;
anet_type net;
deserialize("C:/Users/jonas.gaudin/Documents/QTProject/DlibSpeedTest/dlib_face_recognition_resnet_model_v1.dat") >> net;
matrix<rgb_pixel> img;
load_image(img, "C:/Users/jonas.gaudin/Pictures/source.jpg");
QTime t1;
t1.start();
std::vector<matrix<rgb_pixel>> faces;
auto detections = detector(img);
qDebug() << t1.elapsed();
return a.exec();
}
3 - Dlib find 2 faces in 545 ms (net takes 450ms)
4 - I compile Dlib with CUDA
5 - I run the same code
6 - Dlib find 2 faces in 543 ms ( but Dlib works with CUDA because net takes 5ms now)
Hi, Firstly, thank you for you library. It is very usefull.
Expected Behavior
I would like to do a face detection in real team. So I try to use Dlib with CUDA. I compile Dlib with CUDA and it works fine. I reduce the the research of face caracteristics by 100 This operation takes 5ms now :
std::vector<matrix<float,0,1>> faceDescriptor = net(facesfiltered);
So I am suprised because "get_frontal_face_detector() " operation time is not reduce by CUDA. With the same picture, it takes 500ms without CUDA and 500ms with CUDA
Current Behavior
I try to find face face on a picture (1280/720px) with get_frontal_face_detector()
CUDA not have effect on get_frontal_face_detector()
Steps to Reproduce
1 - I compile dlib19.14 without CUDA 2 - I try find faces :
3 - Dlib find 2 faces in 545 ms (net takes 450ms) 4 - I compile Dlib with CUDA 5 - I run the same code 6 - Dlib find 2 faces in 543 ms ( but Dlib works with CUDA because net takes 5ms now)