How to use NNAPI for NPU?

bkovalenkocomp commented 3 years ago

Hi, I've learned that it's possible to use Tensorflow Lite NNAPI on vim3 Android device,

I have /system/lib/libneuralnetworks.so file in my Android 9 OS. How to make sure NPU is used? I benchmarked my model and it seems NPU is not used during TFLite 8bit inference, because speed is 10x slower and there is no difference between channel and tensor quantised models.

also in dmesg after I run my bench:

[ 4907.441064] type=1400 audit(1293888104.720:419): avc: denied { read } for pid=7157 
comm="benchmar" path="/data/local/tmp/build/model/model.tflite" 
dev="mmcblk0p20" ino=261748 scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:shell_data_file:s0 
tclass=file permissive=1

bkovalenkocomp commented 3 years ago

I figured I gotto do rmmod galcore; insmod galcore.ko I used file from https://github.com/VeriSilicon/TIM-VX/blob/main/cmake/vim3_android.cmake#L3

now I get Segmentation fault

[ 1592.097706] type=1400 audit(1293883711.040:137): avc: denied { getattr } for pid=7113 comm="crash_dump32" path="/data/local/tmp/build/bin/benchmark" dev="mmcblk0p20" ino=261660 scontext=u:r:crash_dump:s0 tcontext=u:object_r:shell_data_file:s0 tclass=file permissive=1

[ 1592.116523] type=1400 audit(1293884789.344:138): avc: denied { read } for pid=7142 comm="benchmar" path="/data/local/tmp/build/model/model.tflite" dev="mmcblk0p20" ino=261748 scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:shell_data_file:s0 tclass=file permissive=1

[ 1596.268514] binder: release 6496:6496 transaction 318105 in, still active
[ 1596.269868] binder: send failed reply for transaction 318105 to 7142:7148
[ 1596.365944] type=1400 audit(1293884789.344:138): avc: denied { read } for pid=7142 comm="benchmar" path="/data/local/tmp/build/model/model.tflite" dev="mmcblk0p20" ino=261748 
scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:shell_data_file:s0 tclass=file permissive=1

[ 1596.378806] binder: 7142:7151 transaction failed 29189/-22, size 2556-504 line 3121
[ 1596.383109] binder: 7142:7153 transaction failed 29189/-22, size 1116-216 line 3121
[ 1596.402119] type=1400 audit(1293884793.612:139): avc: denied { read } for pid=7150 comm="android.hardwar" name="u:object_r:default_prop:s0" dev="tmpfs" ino=5237 scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:default_prop:s0 tclass=file permissive=1

[ 1596.425876] type=1400 audit(1293884793.612:139): avc: denied { read } for pid=7150 comm="android.hardwar" name="u:object_r:default_prop:s0" dev="tmpfs" ino=5237 scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:default_prop:s0 tclass=file permissive=1

[ 1596.449143] type=1400 audit(1293884793.612:140): avc: denied { open } for pid=7150 comm="android.hardwar" path="/dev/__properties__/u:object_r:default_prop:s0" dev="tmpfs" ino=5237 scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:default_prop:s0 tclass=file permissive=1

[ 1596.474428] type=1400 audit(1293884793.612:140): avc: denied { open } for pid=7150 comm="android.hardwar" path="/dev/__properties__/u:object_r:default_prop:s0" dev="tmpfs" ino=5237 scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:default_prop:s0 tclass=file permissive=1

[ 1596.499558] type=1400 audit(1293884793.612:141): avc: denied { getattr } for pid=7150 comm="android.hardwar" path="/dev/__properties__/u:object_r:default_prop:s0" dev="tmpfs" ino=5237 scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:default_prop:s0 tclass=file permissive=1

[ 1615.501419] logd: logdr: UID=0 GID=0 PID=7772 n tail=50 logMask=8 pid=7142 start=0ns timeout=0ns
[ 1615.502305] logd: logdr: UID=0 GID=0 PID=7772 n tail=50 logMask=1 pid=7142 start=0ns timeout=0ns
[ 1615.508703] logd: logdr: UID=0 GID=0 PID=7772 n tail=0 logMask=8 pid=7142 start=0ns timeout=0ns
[ 1615.509669] logd: logdr: UID=0 GID=0 PID=7772 n tail=0 logMask=1 pid=7142 start=0ns timeout=0ns

sunshinemyson commented 3 years ago

Please try disable selinux with “setenforce 0”，make sure you have root access your android.

bkovalenkocomp commented 3 years ago

Hi, thank you for the advice ;-),

I've tried “setenforce 0”, it doesn't help. Will debug further, if you have any ideas I'd be glad to hear them.

sunshinemyson commented 3 years ago

If setenforce can not help, would you share tombstones file?

bkovalenkocomp commented 3 years ago

NNAPI works fine with android 11 and armv8, but not getting speedup from 8bit quanzied model.

I guess it's related to tensor/channel quantization, because both variants give the same speed. Maybe there is flag for NNAPI to force use of tensor quantization approach?

bkovalenkocomp commented 3 years ago

actually no, I forgot to rmmod galcore; insmod galcore.ko, could you provide galcore.ko for Android 11 arm64-v8a, please?

sunshinemyson commented 3 years ago

I confirmed that VIM3 still not have Android R release. No Android 11 library yet. Thanks

bkovalenkocomp commented 3 years ago

I confirmed that VIM3 still not have Android R release. No Android 11 library yet. Thanks

that sad, it seems signed quantization is available only in Android 11 https://source.android.com/devices/neural-networks#android-r

I have following measurements for my model:

a311d
============================== Summary by node type ==============================
                 [Node type]      [count]     [avg ms]      [avg %]     [cdf %]   [mem KB]  [times called]
                     CONV_2D          125      130.277      67.660%     67.660%      0.000        125
         TfLiteNnapiDelegate            4       28.623      14.866%     82.526%      0.000          4
           DEPTHWISE_CONV_2D           13       17.949       9.322%     91.848%      0.000         13
                         PAD           57       14.287       7.420%     99.268%      0.000         57
                    QUANTIZE            1        1.110       0.576%     99.844%      0.000          1
                  DEQUANTIZE           10        0.231       0.120%     99.964%      0.000         10
     RESIZE_NEAREST_NEIGHBOR            5        0.069       0.036%    100.000%      0.000          5

Pixel 4a
============================== Summary by node type ==============================
                 [Node type]      [count]     [avg ms]      [avg %]     [cdf %]   [mem KB]  [times called]
         TfLiteNnapiDelegate            1       17.749     100.000%    100.000%      0.000          1

on a311d with linux, using vx-delegate I had 11 ms ;-/

Do I have to prepare model differently for Android 9?

sunshinemyson commented 3 years ago

Why not just use vx-delegate for Android 9? I think you can build it with NDK.

bkovalenkocomp commented 3 years ago

Why not just use vx-delegate for Android 9? I think you can build it with NDK.

Im not sure I familiar with vx-delegate well enough. Could you provide how to start? Or maybe prebuild libs?

sunshinemyson commented 3 years ago

@bkovalenkocomp ,

you need install android ndk firstly and build ovx drivers for Android with vendor's aosp.

Then, specific CMAKE_TOOLCHAIN_FILE(ndk-root/build/cmake/android.toolchain.cmake) and EXTERNAL_VIV_SDK(sdk dir include openvx driver library and headers) properly. You can build tim-vx as a native library for android.

Simular config can be applied to vx-delegate too.

sunshinemyson commented 3 years ago

Closed issue since no discussion for 10 days.

VeriSilicon / TIM-VX

How to use NNAPI for NPU? #148