Open hyunsik-yoon opened 3 years ago
$ pip install --upgrade --pre --find-links https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html torch==1.8.0.dev20201106+cpu torchvision==0.9.0.dev20201107+cpu`
$ mkdir ~/mobilenetv2-nnapi/
$ python3 pytorch_nnapi_mobilenet.py # code in the above web site
...
$ $ ll ~/mobilenetv2-nnapi/
total 44052
drwxrwxr-x 2 eric eric 4096 12월 2 11:52 ./
drwxr-xr-x 78 eric eric 4096 12월 2 11:52 ../
-rw-rw-r-- 1 eric eric 4206124 12월 2 11:52 mobilenetv2-quant_core-cpu.pt
-rw-rw-r-- 1 eric eric 4195568 12월 2 11:52 mobilenetv2-quant_core-nnapi.pt
-rw-rw-r-- 1 eric eric 3754412 12월 2 11:52 mobilenetv2-quant_full-cpu.pt
-rw-rw-r-- 1 eric eric 3740848 12월 2 11:52 mobilenetv2-quant_full-nnapi.pt
-rw-rw-r-- 1 eric eric 14593598 12월 2 11:52 mobilenetv2-quant_none-cpu.pt
-rw-rw-r-- 1 eric eric 14601862 12월 2 11:52 mobilenetv2-quant_none-nnapi.pt
$ file ~/mobilenetv2-nnapi/mobilenetv2-quant_core-nnapi.pt
/home/eric/mobilenetv2-nnapi/mobilenetv2-quant_core-nnapi.pt: Zip archive data
$ ll mobilenetv2-quant_core-nnapi/
total 36
drwxrwxr-x 4 eric eric 4096 12월 2 11:55 ./
drwxrwxr-x 5 eric eric 4096 12월 2 11:55 ../
drwxrwxr-x 3 eric eric 4096 12월 2 11:55 code/
-rw-rw-r-- 1 eric eric 4 12월 31 1979 constants.pkl
drwxrwxr-x 2 eric eric 4096 12월 2 11:55 data/
-rw-rw-r-- 1 eric eric 8849 12월 31 1979 data.pkl
-rw-rw-r-- 1 eric eric 2 12월 31 1979 version
$ find mobilenetv2-quant_core-nnapi/code/torch/ mobilenetv2-quant_core-nnapi/code/torch/ mobilenetv2-quant_core-nnapi/code/torch/_torch_mangle_2174.py.debug_pkl mobilenetv2-quantcore-nnapi/code/torch/torch_mangle_2174.py mobilenetv2-quant_core-nnapi/code/torch/torch mobilenetv2-quant_core-nnapi/code/torch/torch/classes mobilenetv2-quant_core-nnapi/code/torch/torch/classes/_nnapi.py mobilenetv2-quant_core-nnapi/code/torch/torch/classes/_nnapi.py.debug_pkl mobilenetv2-quant_core-nnapi/code/torch/torch/backends mobilenetv2-quant_core-nnapi/code/torch/torch/backends/_nnapi mobilenetv2-quant_core-nnapi/code/torch/torch/backends/_nnapi/prepare.py.debug_pkl mobilenetv2-quant_core-nnapi/code/torch/torch/backends/_nnapi/prepare.py mobilenetv2-quant_core-nnapi/code/torch/torch/nn mobilenetv2-quant_core-nnapi/code/torch/torch/nn/modules mobilenetv2-quant_core-nnapi/code/torch/torch/nn/modules/container mobilenetv2-quant_core-nnapi/code/torch/torch/nn/modules/container/_torch_mangle_2173.py.debug_pkl mobilenetv2-quantcore-nnapi/code/torch/torch/nn/modules/container/torch_mangle_2173.py mobilenetv2-quant_core-nnapi/code/torch/torch/nn/quantized mobilenetv2-quant_core-nnapi/code/torch/torch/nn/quantized/modules.py mobilenetv2-quant_core-nnapi/code/torch/torch/nn/quantized/modules.py.debug_pkl
it seems that py files are loader or wrappers and maybe mobilenet is serialized inside pickled files.
Before testing nnapi model generated in the above, I tried PyTorch Mobile first to get used to their mobile execution environment.
HelloWorld
, ANDROID_HOME
should be set. (see here)
$ ANDROID_HOME=/home/eric/dev/one_build_android_sdk ./gradlew installDebug
After running the app installed on Galaxy S10, the following error happened:
com.facebook.jni.CppException: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at
../caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with
version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at
../caffe2/serialize/inline_container.cc:132)
Looking into build.gradle, it sounds like we need Torch version 1.4
. I created venv
for PyTorch 1.4.
$ pip install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
But the problem still remains.
Let's go back to nnapi again and see this issue also happens.
# in pytorch cloned dir
$ git submodule update --init --recursive # see https://github.com/pytorch/pytorch/issues/45398
$ rm -rf build_android ; ANDROID_HOME=/home/eric/dev/one_build_android_sdk ANDROID_NDK=/home/eric/dev/one_build_android_ndk/ndk BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DBUILD_BINARY=ON
...
$ ll build_android/bin/speed_benchmark_torch
-rwxrwxr-x 1 eric eric 60254512 12월 2 18:40 build_android/bin/speed_benchmark_torch*
$ adb push build_android/bin/speed_benchmark_torch /data/local/tmp/pytorch
build_android/bin/speed_benchmark_torch: 1 file pushed, 0 skipped. 95.8 MB/s (60254512 bytes in 0.600s)
$ adb push ~/mobilenetv2-nnapi/mobilenetv2-quant_full-nnapi.pt /data/local/tmp/pytorch
/home/eric/mobilenetv2-nnapi/mobilenetv2-quant_full-nn...ushed, 0 skipped. 425.9 MB/s (3740848 bytes in 0.008s)
On Android device:
y2s:/data/local/tmp/pytorch # ./speed_benchmark_torch --pthreadpool_size=1 --model=mobilenetv2-quant_full-nnapi.pt --use_bundled_input=0 --warmup=5 --iter=200
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Microseconds per iter: 25635.7. Iters per second: 39.0081
This runs with Android default nn api.
Now, let's figure out how we can switch default Android nn to ONERT.
libneuralnetworks.so
.LD_LIBRARY_PATH=../one/Product/lib
but there was an error. It seems like OperationFactory::OperationFactory()
generated the error.134|y2s:/data/local/tmp/pytorch # ONERT_LOG_ENABLE=1 LD_LIBRARY_PATH=../nmt/Product/lib ./speed_benchmark_torch --pthreadpool_size=1 --model=mobilenetv2-quant_full-nnapi.pt --use_bundle>
[EXCEPTION] Conv2D: unsupported input operand count
[NNAPI::Model] addOperation: Fail to add operation
terminating with uncaught exception of type std::runtime_error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/torch/backends/_nnapi/prepare.py", line 28, in __setstate__
self.training = False
self.nnapi_module = nnapi_module
_0 = (self.nnapi_module).init()
~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return None
class NnapiModule(Module):
File "code/__torch__/torch/backends/_nnapi/prepare.py", line 105, in init
comp = __torch__.torch.classes._nnapi.Compilation.__new__(__torch__.torch.classes._nnapi.Compilation)
_21 = (comp).__init__()
_22 = (comp).init(self.ser_model, self.weights, )
~~~~~~~~~~ <--- HERE
self.comp = comp
return None
Traceback of TorchScript, original code (most recent call last):
File "/home/eric/venv/pytorch-nightly/lib/python3.6/site-packages/torch/backends/_nnapi/prepare.py", line 36, in init
self.weights = [w.contiguous() for w in self.weights]
comp = torch.classes._nnapi.Compilation()
comp.init(self.ser_model, self.weights)
~~~~~~~~~ <--- HERE
self.comp = comp
RuntimeError: [enforce fail at nnapi_model_loader.cpp:233] result == ANEURALNETWORKS_NO_ERROR.
Aborted (core dumped)
@hyunsik-yoon I am curious to compare PyTorch NNAPI's performance (Microseconds per iter: 25635.7) with TensorFlow Lite's.
So I did experiment with assumption:
mobilenetv2-quant_full-nnapi.pt
does almost same to mobilenet_v2_1.0_224_quant.tflite
.Here is tflite's result on my GS20P:
$ ./benchmark_model --graph=mobilenet_v2_1.0_224_quant.tflite
Inference (avg): 10603.4
$ ./benchmark_model --use_gpu=1 --graph=mobilenet_v2_1.0_224_quant.tflite
Inference (avg): 8461.58
TensorFlow Lite (default) is about 2.4x faster.
@glistening One thing I'd like to mention is that whole sequence of this issue follows the instruction on https://pytorch.org/tutorials/prototype/nnapi_mobilenetv2.html, where the last step is to run the pytorch nnapi benchmark on Android. The purpose is to run any program that runs PyTorch model on on nnapi of ONERT, not benchmark itself.
BTW, the result you mentioned is interesting. 25635.7 was performance of pre-loaded nnapi (not ONERT) on GS20P (Exynos) device and seemed somewhat slow, compared to TFLite. :-O
@hyunsik-yoon
One thing I'd like to mention is that whole sequence of this issue follows the instruction on https://pytorch.org/tutorials/prototype/nnapi_mobilenetv2.html, where the last step is to run the pytorch nnapi benchmark on Android. The purpose is to run any program that runs PyTorch model on on nnapi of ONERT, not benchmark itself.
I am sorry for putting the result beyond this issue. I just want to ensure my assumption that PyTorch
NNAPI backend will be slower than TensorFlow Lite by quick running. I don't want to make another issue since I will not put my time on this issue.
BTW, the result you mentioned is interesting. 25635.7 was performance of pre-loaded nnapi (not ONERT) on GS20P (Exynos) device and seemed somewhat slow, compared to TFLite. :-O
I thought PyTorch
would provide its own NNAPI implementation based on PyTorch
execution engine (or backend). However, it seeming used the NNAPI implementation on Android machine. Then, I will bet it would be clearly slower than tflite.
@glistening
I thought PyTorch would provide its own NNAPI implementation based on PyTorch execution engine (or backend). However, it seeming used the NNAPI implementation on Android machine.
Correct.
Last month, PyTorch team announced that PyTorch model can now run on Android nnapi. It seems that this feature is still a prototype and not an official release.
https://pytorch.org/blog/prototype-features-now-available-apis-for-hardware-accelerated-mobile-and-arm64-builds/#nnapi-support-with-google-android
I will try to run a PyTorch model on NNAPI of ONERT and share the experience.