google-ai-edge / LiteRT

LiteRT is the new name for TensorFlow Lite (TFLite). While the name is new, it's still the same trusted, high-performance runtime for on-device AI, now with an expanded vision.
https://ai.google.dev/edge/litert
Apache License 2.0
170 stars 14 forks source link

Tflite: android benchmark test get differen result on GPU delegate #75

Open gaikwadrahul8 opened 6 days ago

gaikwadrahul8 commented 6 days ago

Hi, When I use the benchmark script and benchmark apk to test my model performance, I get the same performance on CPU of XNNPACK delegate, but different performance on GPU of OpenCL delegate.

And the result

benchmark script benchmark apk
Time performance(ms) cpu gpu cpu gpu
mobilenetv2 5.7 7.4 5.6 4.3
mobilenetv3_small 1.8 4.9 1.7 2.1
mobilenetv3_large 5.3 6 5 3.8

The benchmark apk is almost twice as fast as the benchmark script on GPU.

Q:What's the different between the benchmark script and the apk?

Running on benchmark script

Log of benchmark script:

╰─$ adb install -r -d -g android_aarch64_benchmark_model.apk
╰─$ adb shell /data/local/tmp/android_aarch64_benchmark_model   --graph=/data/local/tmp/mobilenetv3_large.tflite \
  --num_threads=4  --num_runs=50 --use_gpu=true
INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Min num runs: [50]
INFO: Num threads: [4]
INFO: Graph: [/data/local/tmp/mobilenetv3_large.tflite]
INFO: #threads used for CPU inference: [4]
INFO: Use gpu: [1]
INFO: Loaded model /data/local/tmp/mobilenetv3_large.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
INFO: GPU delegate created.
VERBOSE: Replacing 126 out of 126 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions for the whole graph.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
INFO: Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
INFO: The input model file size (MB): 21.9417
INFO: Initialized session in 1342.81ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=72 first=11653 curr=6233 min=5959 max=11653 avg=6835.62 std=903

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=148 first=6394 curr=6988 min=6096 max=8656 avg=6650.52 std=446

INFO: Inference timings in us: Init: 1342809, First inference: 11653, Warmup (avg): 6835.62, Inference (avg): 6650.52
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=108.359 overall=108.359

Running on benchmark apk

╰─$ adb shell am start -S \
  -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
  --es args '"--graph=/data/local/tmp/mobilenetv3_large.tflite \
              --num_threads=4 --num_runs=50 --use_gpu=true"'

Log on logcat:

03-04 16:23:46.821  6210  6210 I tflite  : Initialized OpenCL-based API.
03-04 16:23:46.859  6210  6210 I tflite  : Created 1 GPU delegate kernels.
03-04 16:23:46.859  6210  6210 I tflite  : Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
03-04 16:23:46.859  6210  6210 I tflite  : The input model file size (MB): 21.9417
03-04 16:23:46.859  6210  6210 I tflite  : Initialized session in 856.303ms.
03-04 16:23:46.860  6210  6210 I tflite  : Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
03-04 16:23:47.164  8268  9284 I DisplayFrameSetting: homeToAppEnd pkg=org.tensorflow.lite.benchmark
03-04 16:23:47.363  6210  6210 I tflite  : count=124 first=4601 curr=3901 min=3882 max=4601 avg=4018.02 std=126
03-04 16:23:47.363  6210  6210 I tflite  : Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
03-04 16:23:48.153  1895  3405 I MiuiNetworkPolicy: bandwidth: 0 KB/s, Max bandwidth: 200 KB/s
03-04 16:23:48.365  6210  6210 I tflite  : count=234 first=3909 curr=4552 min=3856 max=5840 avg=4207.86 std=334
03-04 16:23:48.366  6210  6210 I tflite  : Inference timings in us: Init: 856303, First inference: 4601, Warmup (avg): 4018.02, Inference (avg): 4207.86
03-04 16:23:48.366  6210  6210 I tflite  : Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
03-04 16:23:48.366  6210  6210 I tflite  : Memory footprint delta from the start of the tool (MB): init=103.145 overall=103.145

System information

gaikwadrahul8 commented 5 days ago

This issue originally reported by @zihaomu has been moved to this dedicated repository for LiteRT to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.

We appreciate your understanding and look forward to your continued involvement.