How to use GPU delegate option

vujadeyoon commented 2 years ago

Dear ValYouW,

I hope to run Android application with LibTensorFlow-Lite (i.e. C++ based TF-Lite) with a GPU delegate option.

I follow your instructions in the README.md file and Cross Platform Object Detection with TensorFlow Lite - Part I. Then, I can run the given Android application (i.e. tflite-object-detection) with libtensorflowlite.so. I think that Android application using the given libtensorflowlite.so operates with CPU, not GPU. If my guess is wrong, please let me know correct information.

However, when I change the libtensorflowlite.so to libtensorflowlite_gpu_gl.so in CMakeLists.txt in order to run the application with GPU delegate option, the build is failed in the Android Studio. I attach the Build logs [1] in the Android Studio.

I hope to know how to use GPU delegate option.

Questions:

Is the method mentioned above (i.e. renaming the so file) correct in order to use GPU delegate option?
Could you explain how to use GPU delegate option in the given Android application in more detail?

Please note that I did my best to follow your written instructions including OpenCV version (i.e. 4.0.1).

If you have any questions, please feel free to contact me via this channel. I am looking forward to receive your response.

Best regards, Vujadeyoon

[1] Build logs in the Android Studio

Build command failed.
Error while executing process /home/vujadeyoon/Android/Sdk/cmake/3.10.2.4988404/bin/ninja with arguments {-C /home/vujadeyoon/Desktop/TF-Lite-Cpp-API-Android-Example/app/.cxx/cmake/release/armeabi-v7a native-lib}
ninja: Entering directory `/home/vujadeyoon/Desktop/TF-Lite-Cpp-API-Android-Example/app/.cxx/cmake/release/armeabi-v7a'
[1/2] Building CXX object CMakeFiles/native-lib.dir/native-lib.cpp.o
[2/2] Linking CXX shared library /home/vujadeyoon/Desktop/TF-Lite-Cpp-API-Android-Example/app/build/intermediates/cmake/release/obj/armeabi-v7a/libnative-lib.so
FAILED: /home/vujadeyoon/Desktop/TF-Lite-Cpp-API-Android-Example/app/build/intermediates/cmake/release/obj/armeabi-v7a/libnative-lib.so 
: && /home/vujadeyoon/Android/Sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=armv7-none-linux-androideabi30 --gcc-toolchain=/home/vujadeyoon/Android/Sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64 --sysroot=/home/vujadeyoon/Android/Sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/sysroot -fPIC -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -march=armv7-a -mthumb -Wformat -Werror=format-security  -frtti -fexceptions -Oz -DNDEBUG  -Wl,--exclude-libs,libgcc.a -Wl,--exclude-libs,libgcc_real.a -Wl,--exclude-libs,libatomic.a -static-libstdc++ -Wl,--build-id -Wl,--fatal-warnings -Wl,--exclude-libs,libunwind.a -Wl,--no-undefined -Qunused-arguments -shared -Wl,-soname,libnative-lib.so -o /home/vujadeyoon/Desktop/TF-Lite-Cpp-API-Android-Example/app/build/intermediates/cmake/release/obj/armeabi-v7a/libnative-lib.so CMakeFiles/native-lib.dir/native-lib.cpp.o  /home/vujadeyoon/Android/Sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/arm-linux-androideabi/30/liblog.so /home/vujadeyoon/Android/Sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/arm-linux-androideabi/30/libandroid.so /home/vujadeyoon/Desktop/TF-Lite-Cpp-API-Android-Example/app/src/main/cpp/tf-lite-api/generated-libs/armeabi-v7a/libtensorflowlite_gpu_delegate.so -latomic -lm && :
/home/vujadeyoon/Desktop/TF-Lite-Cpp-API-Android-Example/app/src/main/cpp/native-lib.cpp:39: error: undefined reference to 'tflite::ops::builtin::BuiltinOpResolver::BuiltinOpResolver()'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

ValYouW commented 2 years ago

It is possible to run with the GPU delegate but it's not that trivial from what I remember. You can try xnnpack, it should be quite easy, there's a video on my YouTube channel

vujadeyoon commented 2 years ago

Dear ValYouW,

Thank you for your reply.

I checked your YouTube videos including a video, Object detection using Tensorflow Lite C API on Android which are mentioned before.

Also, I checked another GitHub repository, tflite-crossplatform which is corresponding to the above YouTube videos.

I hope to check information becasue I think you are famailiar with the LibTensorFlow-Lite with Android.

Both the shared libraries, libtensorflowlite.so and libtensorflow_c.so which are provided in your GitHub repository's release (i.e. tflite-dist are required when applying the XNNAPCK. Is it correct?
XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 platforms, and it is the default TensorFlow Lite CPU inference engine for floating-point models [1]. Thus, when I applying the XNNPACK with a libtensorflowlite.so or libtensorflowlite_c.so referring to your YouTube videos which are mentioned above, a deep-learning algorithms which are converted to TFLite model run on CPU, not GPU. Is it correct?
Do you think the inference speed of the XNNPACK on CPU is fast enough compared to the GPU delegate option on GPU in Android arm64?

I really appreicate your GitHub repositories and YouTube videos.

Best regards, Vujadeyoon

References

Profiling XNNPACK with TFLite

ValYouW commented 2 years ago

This is the test I did for xnnpack on windows: https://youtu.be/vWbtLIwMrRE Using the xnnpack delegate increased performance significantly, I didn't do any comparison against a GPU delegate (as far as I understand xnnpack runs on the CPU).

tensorflowlite.so is built from the c++ code while tensorflowlite_c.so is the C api - I am currently using only the c api in my projects.

One thing to note about GPU, it was quite some time ago that I did object detection using the GPU delegate, and from what I remember you'll see a performance gain only if you have the image already on the GPU, if your image is on the CPU than running using the GPU delegate is not that helpful.

ValYouW / crossplatform-tflite-object-detecion

How to use GPU delegate option #3