Open gaikwadrahul8 opened 4 days ago
This issue originally reported by @jakubdolejs has been moved to this dedicated repository for LiteRT to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.
We appreciate your understanding and look forward to your continued involvement.
Issue type
Performance
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
2.15.0
Custom code
Yes
OS platform and distribution
No response
Mobile device
Google Pixel 4a running Android 13
Python version
No response
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
I'm running inference on Yolov8-based tflite model on Android using the Interpreter API. I noticed that the first 30 or so calls to the
Interpreter.run()
function take much longer than the subsequent calls. The difference is quite marked, starting at about 3500ms per run and ending at about 500ms.I thought perhaps it was something about the input data so I tried a test with running the same call with the same input 100 times in a loop. Same behaviour, the first handful of inference runs take around 3 seconds, slowly speeding up to about 500–700ms by the 100th iteration.
I wanted to find out whether there is a specific combination of the interpreter options causing this behaviour so I wrote a test matrix initialising interpreters with different options:
There doesn't seem to be any difference whichever combination runs first takes suspicious amount of time for the first handful of inference runs. Sometimes the time never decreases and all the inference runs for the given configuration take a very long time (~3 seconds).
I'm including the code using the bundled runtime. The Play Services runtime times were in line with the bundled runtime.
The device (Google Pixel 4a) is used only for development. There are no other apps installed aside from the test app and whatever was pre-installed on the phone. The device wasn't connected to the internet while running the test.
iOS comparison
In comparison, version 2.14.0 of TfLite for Swift (latest available on CocoaPods) using the CoreML delegate runs inference on the same input using the same model in 70ms on iPhone 12.
Standalone code to reproduce the issue
Relevant log output