davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.59k stars 3.38k forks source link

Android and iOS with quite different performce #719

Closed waterball2016 closed 6 years ago

waterball2016 commented 7 years ago

I tried the same piece of code on both Android and iOS. I got the following results:

OS Phone CPU Image Size Speed
Android samsung galaxy s4 1.6 GHz 320x180 6 fps
iOS iphone 5s 1.2 GHz 320x180 30 fps

This is a pretty odd results. The efficiency of iOS is almost five times of that of Android. By the way, for both OS I enable NEON optimization and shorten the pyramid of scanner from 6 to 3.

xsacha commented 7 years ago

Assuming the code isn't multi-threaded (I think it uses CBLAS by default which only uses one core?), a 1.6GHz Cortex-A15 armv7 and the 1.2GHz cortex-a57 armv8 ('cyclone') could have massively different performance.

Probably not as much as you see there, but enough to make up for the clock speed for sure: http://images.anandtech.com/doci/7995/Screen%20Shot%202014-05-06%20at%202.59.56%20AM.png

davisking commented 7 years ago

The poster didn't say what they are running.

waterball2016 commented 7 years ago

Oh, the above data is the result of the face detection demo. Well the face landmark detection demo (only the shape predict part) has similar performance in Android and iOS. How could it be?

davisking commented 7 years ago

Which face detection demo?

waterball2016 commented 7 years ago

dlib/examples/face_detection_ex.cpp I copy the code snippet of face detection to my own project and keep track of the time consumed. frontal_face_detector m_detector = get_frontal_face_detector(); std::vector<rectangle> faces; faces = m_detector(img);

davisking commented 7 years ago

Are you including the model load time in get_frontal_face_detector()

waterball commented 7 years ago

definitely no. This problem has botherred me for a week. I also tried opencv face detection, which has similar performance in Android and iOS, but less accurate.

davisking commented 7 years ago

Well, maybe NEON isn't really enabled on iOS. You should print some #error statement or something similar in the NEON code in (dlib/simd) and see if it triggers to be sure NEON is really being used.

waterball commented 7 years ago

For both iOS and Android, I've debugged into the simd code. NEON is surely enabled. The thing is that face detection behaves differently in Android and iOS, while landmark detection performs OK. The only reason I can think of is the compilation. However I check the make logs of both platform, it seems alright to me. So I'm lost now

ZipperDeng commented 7 years ago

I meet the same problem,is it armv8a architecture optimizing?maybe we can profiler some compute,but i am not familiar with source of dlib.

e-fominov commented 7 years ago

Dlib can work on Android and IOS, but most of development is focused on larger machines

We can try to help you, if you will give additional information

Please make minimal test program that we can try to reproduce, that will have time measurement. You can find sample program and sample mrasurements here: #557

xsacha commented 7 years ago

Haven't really tried on iOS but the only slow part on Android for me has been model generation (300ms per face on new phones).

waterball2016 commented 7 years ago

OK, later I will write two demos for Android and iOS. I've also profiled the dlib code on x86, it only has slight enhancement on performance. I will profile dlib on arm later.

waterball2016 commented 7 years ago

I've written a sample code in QT since it supports both Android and iOS. The code is here demo.

I make it to detect faces in a big image and calculate the time cost. For iOS it costs 345ms and for Android that is 12367ms, quite amazing result.

xsacha commented 7 years ago

Too many variables. Can you try this on an armv8 Android phone? The slowest phone I have for a single model was about 2000ms on a Cortex-A53. Was actually faster on a Cortex-A15.

I haven't seen anything near that slow on Android but I will test your code later.

waterball2016 commented 7 years ago

I've tried this on armv8 android phone. no difference.

xsacha commented 7 years ago

Is it possible on iOS that the Accelerate framework could be used to run the BLAS functions? It has a cblas_ prefix but should otherwise be capable.

jgoenetxea commented 6 years ago

I am also testing the face detector and face landmark detector in different platforms (using dlib v19.7), and there is a significant difference between iOS and Android. I tried to include OpenBLAS in the compilation (for Android), but there is not a significant change (I am not using the Accelerate framework on iOS).

Processors are Qualcom Snapdragon 808 vs A9 (pc information is not relevant).

The face detector spends an average time of 0.225ms detecting the faces on Android, while iOS spends only 0.043ms. The landmark detector spends and an average time of 0.014ms detecting the landmarks on Android, while iOS spends only 0.0042ms. I have also seen that the first iteration of the landmark detector spends more time than the rest, but I could not find what is it initializing.

I have two doubts here: 1- Is the SIMD instruction set included for those computations? 2- Is BLAS (even the internal BLAS implementation) used in any of those features?

davisking commented 6 years ago

BLAS isn't really used for these things and only the face detector makes substantive use of SIMD.

annerajb commented 6 years ago

I am also seeing horrible performance on ios. I can post a instrument profiler sample which will show the calls and the time spent. This is on a iphone 6 it takes around 25 seconds to run the frontal face detector on one frame.

@jgoenetxea can you write briefly what you did in iOS to achieve this?

jgoenetxea commented 6 years ago

Nothing special. I am not using any trick. I have compiled the library using Xcode.

Check if you are loading the detection model on each iteration, or using very big images.

dlib-issue-bot commented 6 years ago

Warning: this issue has been inactive for 181 days and will be automatically closed on 2018-09-07 if there is no further activity.

If you are waiting for a response but haven't received one it's likely your question is somehow inappropriate. E.g. you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's documentation, or a Google search.

dlib-issue-bot commented 6 years ago

Notice: this issue has been closed because it has been inactive for 185 days. You may reopen this issue if it has been closed in error.