google-ar / arcore-android-sdk

ARCore SDK for Android Studio
https://developers.google.com/ar
Other
4.95k stars 1.22k forks source link

ARCore + TFLite inference on GPU stalling both #1663

Open RomanStadlhuber opened 1 month ago

RomanStadlhuber commented 1 month ago

SPECIFIC ISSUE ENCOUNTERED

I am running AR Core for motion tracking using the android SDK, basically at the state of the hello_ar_kotlin sample.

When running inference on (separate!) image data, while TFLite can access the GPU, both systems break down when being run together, with ARCore dropping to 2-3 FPS and TFLite to 1 FPS.

VERSIONS USED

versionName=1.44.241490493
signatures=PackageSignatures{c31e10a version:3, signatures:[fd17c870], past signatures:[53bcc66f flags: 17, fd17c870 flags: 17]}
google/panther/panther:14/AP1A.240305.019.A1/11445699:user/release-keys

STEPS TO REPRODUCE THE ISSUE

Dependencies

// for ARCore
implementation("com.google.ar:core:1.42.0")
implementation("de.javagl:obj:0.2.1")
// for TFLite
implementation("com.google.android.gms:play-services-tflite-impl:16.1.0")
implementation("com.google.android.gms:play-services-tflite-java:16.1.0")
implementation("com.google.android.gms:play-services-tflite-support:16.1.0")
implementation("com.google.android.gms:play-services-tflite-gpu:16.2.0")
implementation("com.google.android.gms:play-services-tflite-acceleration-service:16.0.0-beta01")
  1. Clone hello_ar_kotlin.
  2. Add TFLite (Interpreter API), e.g. TensorFlow Lite Inference.
  3. Run ARCore and TFLite inference simultaneously.
  4. Performance of both will drop to unusable levels.

WORKAROUNDS (IF ANY)

None found yet.

ADDITIONAL COMMENTS

I cannot use MLKit with ARCore as

GPU Memory usage of TFLite alone seems to be 771 MB on average. GPU Memory usage with ARCore together seems to be about 810 MB on average, so not that much more.

Ressource profiling data obtained with Android GPU inspector

15kingben commented 1 month ago

Can you share CPU profiling data for running both simultaneously?

RomanStadlhuber commented 1 month ago

Hello @15kingben, you can find the profiling data here. Please let me know if you need any additional data.

15kingben commented 1 month ago

Is the performance of the inference model alone significantly higher? ARCore uses TFLite for certain features, however e.g. the Semantics API runs on a downscaled 192x256 resolution image to save performance. Even if inference is run on the GPU, the CPU may be used for certain operations.

RomanStadlhuber commented 1 month ago

Dear @15kingben thank you for investigating. On relative terms (i.e. compared to its peak performance) the model is significantly slower. You are right that parts of the model do run on the CPU, and I am aware of this. To that end, the model architecture can be taken care of from my side.

What's more troubling and ultimately the reason I created this issue, is that ARCore performance seems to drop significantly as well, even stalling as the title says.

At times the pose output of ARCore drops to 2-3 [Hz], I have also had occurrences where it stopped for several seconds, likely leading to an internal reset of the Filter. Since ARCore is a closed system, I can't tell where the problem lies.

There also doesn't seem to be an option for deliberately splitting up ressources between the two (ARCore and TFLite) either.

So I guess the gist of this issue is - can there be done anything at the API level to mitigate this drop in performance of ARCore? Since at every invocation I am essentially risking stability of the motion tracking process/output.

Any help in that regard would be greatly appreciated. Many thanks again for your support.

RomanStadlhuber commented 1 week ago

@15kingben has there any progress been made or are you aware of a workaround that I can use to overcome this issue?

15kingben commented 1 week ago

Hi Roman, I'm sorry I have not had time to investigate this issue recently. I was not able to recreate this issue by running our Semantics model simultaneously with ARCore on the GPU delegate in the hello_ar_kotlin example, although I was simply feeding dummy data into the model. Can you share the full code of your reproducible example, specifically how the images are sourced for the TFLite model's input.