Closed waterball2016 closed 6 years ago
Assuming the code isn't multi-threaded (I think it uses CBLAS by default which only uses one core?), a 1.6GHz Cortex-A15 armv7 and the 1.2GHz cortex-a57 armv8 ('cyclone') could have massively different performance.
Probably not as much as you see there, but enough to make up for the clock speed for sure: http://images.anandtech.com/doci/7995/Screen%20Shot%202014-05-06%20at%202.59.56%20AM.png
The poster didn't say what they are running.
Oh, the above data is the result of the face detection demo. Well the face landmark detection demo (only the shape predict part) has similar performance in Android and iOS. How could it be?
Which face detection demo?
dlib/examples/face_detection_ex.cpp
I copy the code snippet of face detection to my own project and keep track of the time consumed.
frontal_face_detector m_detector = get_frontal_face_detector();
std::vector<rectangle> faces;
faces = m_detector(img);
Are you including the model load time in get_frontal_face_detector()
definitely no. This problem has botherred me for a week. I also tried opencv face detection, which has similar performance in Android and iOS, but less accurate.
Well, maybe NEON isn't really enabled on iOS. You should print some #error statement or something similar in the NEON code in (dlib/simd) and see if it triggers to be sure NEON is really being used.
For both iOS and Android, I've debugged into the simd code. NEON is surely enabled. The thing is that face detection behaves differently in Android and iOS, while landmark detection performs OK. The only reason I can think of is the compilation. However I check the make logs of both platform, it seems alright to me. So I'm lost now
I meet the same problem,is it armv8a architecture optimizing?maybe we can profiler some compute,but i am not familiar with source of dlib.
Dlib can work on Android and IOS, but most of development is focused on larger machines
We can try to help you, if you will give additional information
Please make minimal test program that we can try to reproduce, that will have time measurement. You can find sample program and sample mrasurements here: #557
Haven't really tried on iOS but the only slow part on Android for me has been model generation (300ms per face on new phones).
OK, later I will write two demos for Android and iOS. I've also profiled the dlib code on x86, it only has slight enhancement on performance. I will profile dlib on arm later.
I've written a sample code in QT since it supports both Android and iOS. The code is here demo.
I make it to detect faces in a big image and calculate the time cost. For iOS it costs 345ms and for Android that is 12367ms, quite amazing result.
Too many variables. Can you try this on an armv8 Android phone? The slowest phone I have for a single model was about 2000ms on a Cortex-A53. Was actually faster on a Cortex-A15.
I haven't seen anything near that slow on Android but I will test your code later.
I've tried this on armv8 android phone. no difference.
Is it possible on iOS that the Accelerate framework could be used to run the BLAS functions? It has a cblas_ prefix but should otherwise be capable.
I am also testing the face detector and face landmark detector in different platforms (using dlib v19.7), and there is a significant difference between iOS and Android. I tried to include OpenBLAS in the compilation (for Android), but there is not a significant change (I am not using the Accelerate framework on iOS).
Processors are Qualcom Snapdragon 808 vs A9 (pc information is not relevant).
The face detector spends an average time of 0.225ms detecting the faces on Android, while iOS spends only 0.043ms. The landmark detector spends and an average time of 0.014ms detecting the landmarks on Android, while iOS spends only 0.0042ms. I have also seen that the first iteration of the landmark detector spends more time than the rest, but I could not find what is it initializing.
I have two doubts here: 1- Is the SIMD instruction set included for those computations? 2- Is BLAS (even the internal BLAS implementation) used in any of those features?
BLAS isn't really used for these things and only the face detector makes substantive use of SIMD.
I am also seeing horrible performance on ios. I can post a instrument profiler sample which will show the calls and the time spent. This is on a iphone 6 it takes around 25 seconds to run the frontal face detector on one frame.
@jgoenetxea can you write briefly what you did in iOS to achieve this?
Nothing special. I am not using any trick. I have compiled the library using Xcode.
Check if you are loading the detection model on each iteration, or using very big images.
Warning: this issue has been inactive for 181 days and will be automatically closed on 2018-09-07 if there is no further activity.
If you are waiting for a response but haven't received one it's likely your question is somehow inappropriate. E.g. you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's documentation, or a Google search.
Notice: this issue has been closed because it has been inactive for 185 days. You may reopen this issue if it has been closed in error.
I tried the same piece of code on both Android and iOS. I got the following results:
This is a pretty odd results. The efficiency of iOS is almost five times of that of Android. By the way, for both OS I enable NEON optimization and shorten the pyramid of scanner from 6 to 3.