ENCP / CNNdroid

Open Source Library for GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android
MIT License
539 stars 181 forks source link

Unable to reproduce runtime performances on HTC One M9 #8

Closed AlexandreBriot closed 7 years ago

AlexandreBriot commented 7 years ago

Hello,

I am trying to run your caffenet demo on a HTC One M9. I measured inference times larger than yours (between 1.5s and 2.3s per image) I suspect I am not using GPU properly.

In the catlog I see this error : 12-23 15:04:36.066 23066-23158/com.library.example.caffenet E/RenderScript: Unable to open shared library (/data/user/0/com.library.example.caffenet/code_cache/com.android.renderscript.cache/librs.innerproductinf8outf1.so): undefined symbol: init

Could it be related ? Any idea how to fix it ?

Thanks in advance for your help and advice.

Alex

hosseingolestani commented 7 years ago

Hi

What is your allocated_ram in the network definition file?

Did you install the app from the source project or from the apk file?

Regarding the error, could you please send more of your logcat?

Regards

AlexandreBriot commented 7 years ago

Hi Hossein,

Thank you for your quick answer.

In the network definition file, I tried with the _allocatedram default value of 350 and also with 512 instead. I tried with your defaulf app and also with a modified version where I read some pictures from my sd card. The error I sent you was noticed when running my customized version.

My latest test with the default app (installed from the apk file) displays these logs for a total runtime of more than 2 seconds. Does it seem OK to you ?

Thank you again Alex

12-26 10:08:34.408 11451-11497/com.library.example.caffenet D/CNNdroid: layers.conv4: Kernel Initialization Time in Constructor = 203 12-26 10:08:34.756 11451-11461/com.library.example.caffenet I/art: Background sticky concurrent mark sweep GC freed 1537(186KB) AllocSpace objects, 8(2MB) LOS objects, 4% free, 50MB/53MB, paused 644us total 103.798ms 12-26 10:08:35.302 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 1325(78KB) AllocSpace objects, 4(6MB) LOS objects, 25% free, 47MB/63MB, paused 614us total 159.185ms 12-26 10:08:35.446 11451-11497/com.library.example.caffenet D/CNNdroid: layers.conv5: Parameters Load Time in Constructor = 1033 12-26 10:08:35.599 11451-11497/com.library.example.caffenet D/CNNdroid: layers.conv5: Kernel Initialization Time in Constructor = 153 12-26 10:08:35.782 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 415(20KB) AllocSpace objects, 1(2MB) LOS objects, 6% free, 225MB/241MB, paused 403us total 143.261ms 12-26 10:08:36.065 11451-11457/com.library.example.caffenet W/art: Suspending all threads took: 95.787ms 12-26 10:08:36.084 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 65(2304B) AllocSpace objects, 0(0B) LOS objects, 4% free, 369MB/385MB, paused 355us total 169.915ms 12-26 10:09:23.691 11451-11497/com.library.example.caffenet D/CNNdroid: layers.fc6: Parameters Load Time in Constructor = 48084, Shape: 4096 12-26 10:09:24.248 11451-11497/com.library.example.caffenet D/CNNdroid: layers.fc6: Kernel Initialization Time in Constructor = 557 12-26 10:09:24.294 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 29299(3MB) AllocSpace objects, 177(183MB) LOS objects, 5% free, 270MB/286MB, paused 685us total 291.282ms 12-26 10:09:24.945 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 7400(243KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 333MB/349MB, paused 516us total 163.446ms 12-26 10:09:45.581 11451-11497/com.library.example.caffenet D/CNNdroid: layers.fc7: Parameters Load Time in Constructor = 21330, Shape: 4096 12-26 10:09:45.837 11451-11497/com.library.example.caffenet D/CNNdroid: layers.fc7: Kernel Initialization Time in Constructor = 256 12-26 10:09:45.948 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 13138(1594KB) AllocSpace objects, 78(81MB) LOS objects, 5% free, 273MB/289MB, paused 422us total 285.674ms 12-26 10:09:47.704 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 4228(216KB) AllocSpace objects, 5(100KB) LOS objects, 5% free, 289MB/305MB, paused 350us total 177.436ms 12-26 10:09:51.182 11451-11497/com.library.example.caffenet D/CNNdroid: layers.fc8: Parameters Load Time in Constructor = 5341, Shape: 1000 12-26 10:09:51.241 11451-11497/com.library.example.caffenet D/CNNdroid: layers.fc8: Kernel Initialization Time in Constructor = 58 12-26 10:09:51.243 11451-11497/com.library.example.caffenet D/#####: Hello! 12-26 10:09:51.364 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 2643(309KB) AllocSpace objects, 15(19MB) LOS objects, 5% free, 269MB/285MB, paused 422us total 167.689ms 12-26 10:09:51.548 11451-11457/com.library.example.caffenet W/art: Suspending all threads took: 58.900ms 12-26 10:09:51.564 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 718(55KB) AllocSpace objects, 1(20KB) LOS objects, 4% free, 340MB/356MB, paused 308us total 143.389ms 12-26 10:09:51.968 11451-11451/com.library.example.caffenet I/Choreographer: Skipped 41 frames! The application may be doing too much work on its main thread. 12-26 10:10:03.821 11451-11451/com.library.example.caffenet D/CNNdroid: layers.conv1: Computation Run Time = 487 12-26 10:10:03.910 11451-11451/com.library.example.caffenet D/CNNdroid: layers.pool1: Computation Run Time = 88 12-26 10:10:04.132 11451-11457/com.library.example.caffenet W/art: Suspending all threads took: 143.034ms 12-26 10:10:04.172 11451-11451/com.library.example.caffenet D/CNNdroid: layers.norm1: Computation Run Time = 262 12-26 10:10:04.197 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 34457(4MB) AllocSpace objects, 1(71MB) LOS objects, 5% free, 282MB/298MB, paused 393us total 225.071ms 12-26 10:10:04.624 11451-11451/com.library.example.caffenet D/CNNdroid: layers.conv2: Computation Run Time = 452 12-26 10:10:04.665 11451-11451/com.library.example.caffenet D/CNNdroid: layers.pool2: Computation Run Time = 41 12-26 10:10:04.729 11451-11451/com.library.example.caffenet D/CNNdroid: layers.norm2: Computation Run Time = 63 12-26 10:10:04.972 11451-11451/com.library.example.caffenet D/CNNdroid: layers.conv3: Computation Run Time = 243 12-26 10:10:05.168 11451-11451/com.library.example.caffenet D/CNNdroid: layers.conv4: Computation Run Time = 196 12-26 10:10:05.297 11451-11451/com.library.example.caffenet D/CNNdroid: layers.conv5: Computation Run Time = 128 12-26 10:10:05.316 11451-11451/com.library.example.caffenet D/CNNdroid: layers.pool5: Computation Run Time = 19 12-26 10:10:05.379 11451-11451/com.library.example.caffenet D/CNNdroid: layers.fc6: Computation Run Time = 63 12-26 10:10:05.404 11451-11451/com.library.example.caffenet D/CNNdroid: layers.fc7: Computation Run Time = 24 12-26 10:10:05.414 11451-11451/com.library.example.caffenet D/CNNdroid: layers.fc8: Computation Run Time = 10 12-26 10:10:05.415 11451-11451/com.library.example.caffenet D/CNNdroid: layers.sm: Computation Run Time = 1 12-26 10:10:05.419 11451-11451/com.library.example.caffenet I/Choreographer: Skipped 141 frames! The application may be doing too much work on its main thread. 12-26 10:11:29.973 11451-11461/com.library.example.caffenet I/art: Background sticky concurrent mark sweep GC freed 106256(9MB) AllocSpace objects, 254(7MB) LOS objects, 5% free, 281MB/297MB, paused 5.481ms total 44.888ms 12-26 10:16:49.366 11451-11461/com.library.example.caffenet I/art: Background partial concurrent mark sweep GC freed 56123(9MB) AllocSpace objects, 441(8MB) LOS objects, 5% free, 279MB/295MB, paused 348us total 169.025ms 12-26 10:24:06.785 11451-11461/com.library.example.caffenet W/art: Suspending all threads took: 35.780ms 12-26 10:25:18.863 11451-11457/com.library.example.caffenet W/art: Suspending all threads took: 96.562ms

hosseingolestani commented 7 years ago

Hi Alex,

We can imagine two reasons for the run time of 2 seconds rather than 0.7 second (which was reported in the paper):

  1. The sample app performs the computations on one image, while the numbers in the paper are for one image when the computations are performed on a batch of 16 images. By passing a batch of images through the network, we can achieve a higher performance as compared to passing a single image through the network in that for a batch of images, some computations of two images may be performed concurrently on the CPU and GPU.

  2. When the app is launched, the network parameters are loaded into RAM as the network layers are constructed. But, as you can see in the logcat, some allocated memory space is freed before the network run (which may be a part of the memory management policy of HTC One M9). So, loading the parameters again may increase the run time. (Note that the run time for the first two convolution layers is unexpectedly high.)

Regarding the error that you mentioned in your first message ("Unable to open shared library..."), we don't know exactly what it is yet, but we can assure you that the GPU is exploited.

Regards

AlexandreBriot commented 7 years ago

Hi Hossein,

Thanks again for your answer.

I did some more tests and get better results. I killed all running apps, reduced screen luminosity and set airplane mode following your paper recommendations.

With your demo app using phone camera with batchsize 1 inference, I got 930 ms runtime. With my modified code with batchsize of 16 images, I got 770 ms per image runtime.

I also noticed that when I processed several batches in a raw, it seems that the runtime increases progressively. It might be related to memory management as you explained.

I always got the RenderScript error as mentinned in my first comment but it seems it has no impact on runtime since I managed to reproduce your evaluation.

Thanks for your help Alex