Closed alexdmiller closed 1 year ago
Some more information:
I notice that GMP applies patches to the Tensorflow Lite dependency. I tried commenting out the patches to see if this was potentially causing the slow inference. It does not appear that my model depended on any of the custom ops, so I am able to comment out the code that depends on this patch and continue to run my model. I was not able to remove the org_tensorflow_objc_cxx17.diff
patch.
In the end, my WORKSPACE
file looks like this:
_TENSORFLOW_GIT_COMMIT = "52a2905cbc21034766c08041933053178c5d10e3"
_TENSORFLOW_SHA256 = "06d4691bcdb700f3275fa0971a1585221c2b9f3dffe867963be565a6643d7f56"
http_archive(
name = "org_tensorflow",
patch_args = [
"-p1",
],
patches = [
# "@//third_party:org_tensorflow_compatibility_fixes.diff",
"@//third_party:org_tensorflow_objc_cxx17.diff",
# Diff is generated with a script, don't update it manually.
# "@//third_party:org_tensorflow_custom_ops.diff",
],
sha256 = _TENSORFLOW_SHA256,
strip_prefix = "tensorflow-%s" % _TENSORFLOW_GIT_COMMIT,
urls = [
"https://github.com/tensorflow/tensorflow/archive/%s.tar.gz" % _TENSORFLOW_GIT_COMMIT,
],
)
load("@org_tensorflow//tensorflow:workspace3.bzl", "tf_workspace3")
tf_workspace3()
load("@org_tensorflow//tensorflow:workspace2.bzl", "tf_workspace2")
tf_workspace2()
However, even after removing these patches, inference is still slow on iOS.
Update:
I tried going back to an earlier version of mediapipe. I went back to 38be2ec58f2a1687f4ffca287094c7bbd7791f58. However, I'm seeing the exact same issue on iOS.
My suspicion is that tensorflow lite is being built or configured for iOS in a way that differs from the normal tensorflow lite library, but I'm not sure how to investigate further. Any ideas for further investigation would be appreciated!
Another update:
I used the tflite performance benchmarking app to run our model. On a single thread, the model completes in an average of 226 ms. This is much slower than the ~10 seconds we're seeing with Google Media Pipe. So the conclusion is again that Google Media Pipe is somehow running the model much slower for some reason.
The app spits out a bunch of profiling information. I'm happy to share more, but here are some summary tables that might be helpful:
============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
TFLite_Detection_PostProcess 185.381 41.993 27.689 12.997% 12.997% 0.000 1 [StatefulPartitionedCall:3, StatefulPartitionedCall:2, StatefulPartitionedCall:1, StatefulPartitionedCall:0]:154
CONV_2D 16.746 15.996 15.849 7.439% 20.437% 0.000 1 [tfl.conv_2d2]:3
CONV_2D 60.020 7.506 7.505 3.523% 23.959% 0.000 1 [tfl.conv_2d6]:10
CONV_2D 0.000 7.430 7.477 3.510% 27.469% 0.000 1 [tfl.conv_2d]:0
CONV_2D 42.750 7.505 7.442 3.493% 30.962% 0.000 1 [tfl.conv_2d4]:6
DEPTHWISE_CONV_2D 32.595 6.681 6.717 3.153% 34.115% 0.000 1 [tfl.depthwise_conv_2d1]:4
DEPTHWISE_CONV_2D 7.478 6.526 6.508 3.055% 37.170% 0.000 1 [tfl.depthwise_conv_2d]:1
CONV_2D 54.824 4.924 4.842 2.273% 39.442% 0.000 1 [tfl.conv_2d5]:8
CONV_2D 139.052 4.737 4.739 2.224% 41.667% 0.000 1 [tfl.conv_2d34]:63
DEPTHWISE_CONV_2D 50.193 4.704 4.630 2.173% 43.840% 0.000 1 [tfl.depthwise_conv_2d2]:7
Number of nodes executed: 155
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
CONV_2D 72 142.635 66.975% 66.975% 0.000 72
DEPTHWISE_CONV_2D 51 40.246 18.898% 85.873% 0.000 51
TFLite_Detection_PostProcess 1 27.689 13.001% 98.874% 0.000 1
ADD 12 1.189 0.558% 99.432% 0.000 12
LOGISTIC 1 0.732 0.344% 99.776% 0.000 1
PACK 4 0.217 0.102% 99.878% 0.000 4
RESHAPE 12 0.190 0.089% 99.967% 0.000 12
CONCATENATION 2 0.070 0.033% 100.000% 0.000 2
Here is my configuration for the profiling app. As you can see, it should be running on the CPU using the normal TFLite delegate:
{
"benchmark_name" : "benchmark",
"num_threads" : "1",
"num_runs" : "20",
"warmup_runs" : "1",
"graph" : "my-model.tflite",
"input_layer" : "input",
"input_layer_shape" : "1,640,640,3",
"run_delay" : "-1",
"enable_op_profiling": "true",
"use_xnnpack": "false",
}
With some effort I have profiled the model on iOS within Google Mediapipe. I found the following average times:
node | AVERAGE of ms |
---|---|
CONV_2D | 365.1569231 |
DEPTHWISE_CONV_2D | 8.390066667 |
ADD | 0.7461666667 |
PACK | 0.323 |
RESHAPE | 0.095 |
Compared with the raw tflite profiling from my previous comment, you can see that CONV_2D
has a significantly higher average in Google Media Pipe (average of 143ms using tflite interpreter vs. average of 365ms using mediapipe). The average doesn't tell the whole story though. Looking at the individual timings, a single CONV_2D
Op took 10 seconds:
ms | node |
---|---|
10160.012 | CONV_2D |
This is clearly the culprit. I'm not familiar enough with tflite to know how to investigate further. Any suggestions are welcome.
@hadon @NikolayChirkov I'm at my wits end for things to investigate at this point. I'm happy to provide any other information that might be useful. Or if you are too busy to help diagnose the problem, if you could provide suggestions for paths to investigate, that would be very helpful! Thank you!
@hadon @NikolayChirkov Hi, just wanted to ping this thread again to see if either of you had ideas for further investigation. Thanks!
bazel build — copt=-fembed-bitcode — apple_bitcode=embedded — config=ios_arm64
Is the above command used for ios build? Also can you try once more on the new branch - Releases v0.8.9
Thanks @PrinceP for the suggestion. I had not been building with those flags. However, the performance issue persists when I use the flags you suggested. Here is what I used to build:
bazel build --copt=-fembed-bitcode --apple_bitcode=embedded --config=ios_arm64 //mediapipe/my_company:MyProject
And here is my BUILD file, in case that is a useful reference:
load("@build_bazel_rules_apple//apple:ios.bzl", "ios_framework")
ios_framework(
name = "RDTVision",
hdrs = [
"RDTInterpreter.h",
],
bundle_id = "org.my_company.rdtvision",
families = [
"iphone",
"ipad",
],
infoplists = ["Info.plist"],
minimum_os_version = "10.0",
deps = [
":RDTVisionLibrary",
"@ios_opencv//:OpencvFramework",
],
)
objc_library(
name = "RDTVisionLibrary",
srcs = [
"RDTInterpreter.mm",
],
hdrs = [
"RDTInterpreter.h",
],
data = [
"//mediapipe/graphs/my_graph:my_graph",
"//mediapipe/my_company/assets/models/my_model:my_model.tflite",
],
sdk_frameworks = [
"AVFoundation",
"CoreGraphics",
"CoreMedia",
"UIKit",
],
deps = [
"//mediapipe/calculators/image:image_cropping_calculator",
"//mediapipe/calculators/tensor:image_to_tensor_calculator",
"//mediapipe/calculators/tensor:inference_calculator",
"//mediapipe/calculators/tensor:tensors_to_detections_calculator",
"//mediapipe/calculators/tflite:ssd_anchors_calculator",
"//mediapipe/calculators/util:annotation_overlay_calculator",
"//mediapipe/calculators/util:detection_label_id_to_text_calculator",
"//mediapipe/calculators/util:detection_projection_calculator",
"//mediapipe/calculators/util:detections_to_rects_calculator",
"//mediapipe/calculators/util:detections_to_render_data_calculator",
"//mediapipe/calculators/util:non_max_suppression_calculator",
"//mediapipe/calculators/util:to_image_calculator",
"//mediapipe/framework/formats:landmark_cc_proto",
"//mediapipe/graphs/hand_tracking:mobile_calculators",
"//mediapipe/graphs/mesa_graph:mesa_calculators",
"//mediapipe/objc:mediapipe_framework_ios",
"//mediapipe/objc:mediapipe_input_sources_ios",
"//mediapipe/objc:mediapipe_layer_renderer",
"@ios_opencv//:OpencvFramework"
],
)
I'll now try the Releases v0.8.9 branch you mentioned. Thanks!
@PrinceP I just got everything building with v0.8.9, and unfortunately I'm seeing the same perf issue.
Is it possible for you to give the model file. The same behaviour is not present for any other solutions in IOS.
@PrinceP We cannot share the actual model externally, but we have prepared a dummy version of our model that exhibits the same performance issue. You can find it here: https://drive.google.com/file/d/1ANACfwsjvD19IZkRimNkVsJGDnFj28hW/view
Thanks for taking a look!
Hi @PrinceP @hadon @NikolayChirkov, happy new year!
We still haven't been able to figure out this performance issue and are blocked on this issue. Any help in diagnosing the performance problem would be very appreciated, thanks! Let me know if you have problems with the model I posted above.
Hi @PrinceP @hadon @NikolayChirkov. Sorry to ping again. Any ideas on your end would be appreciated. I'm happy to try any ideas you have for diagnosing the issue. Thanks!
Hi @alexdmiller, Could you please try out this Estimator file developed in GSOC. I am curious whelther it will show the same behaviour.
Have you tried in any other IOS version or device?
Hello @alexdmiller, We are upgrading the MediaPipe Legacy Solutions to new MediaPipe solutions However, the libraries, documentation, and source code for all the MediaPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.
You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions which can help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
This issue was closed due to lack of activity after being marked stale for past 7 days.
Please make sure that this is a bug and also refer to the troubleshooting, FAQ documentation before raising any issues.
System information (Please provide as much relevant information as possible):
master
Describe the current behavior:
I have a
.tflite
model that I'm trying to run within a mediapipe graph. When I run the graph on Android, inference runs quickly. When I run the model directly using Tensorflow Lite 2.7 on iOS, inference runs quickly. However, when I run the graph on iOS, inference runs very slowly (~11 seconds).Describe the expected behavior:
Inference should be quick on iOS.
Standalone code to reproduce the issue:
Here is the contents of my graph definition:
If inspecting the tflite model itself would be useful, I can check with my team that it would be okay to upload that. In the meantime, I used the tflite model visualizer script to inspect the ops:
Does anything stand out here as an op that would be slow?
Other info / Complete Logs : Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached