orsveri commented 2 years ago

Why the mediapipe inference calculator has much more longer inference time than simple running of .tflite model?

System information (Please provide as much relevant information as possible)

Have I written custom code (as opposed to using a stock example script provided in Mediapipe): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4): Linux Ubuntu 20.04 on Raspberry Pi 4B
MediaPipe version: Current master branch
Bazel version: 3.7.2
Solution (e.g. FaceMesh, Pose, Holistic): Hands

Describe the expected behavior:

I am profiling the solution. I use TFLite benchmark tool to measure .tflite models performance, and Mediapipe profiler to measure performance of specific nodes in the mediapipe graph. I built the solution with xnn support enabled.

I expect that the average inference time of .tflite model and the corresponding inference calculator node would be approximately the same.

Standalone code you may have used to try to get what you need :

Build the solution with the following command: bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 --define tflite_with_xnnpack=true mediapipe/examples/desktop/hand_tracking:hand_tracking_cpu

In the begginning of the file /mediapipe/graphs/hand_tracking/hand_tracking_desktop_live.pbtxt insert the following code:

profiler_config {
trace_enabled: true
enable_profiler: true
trace_log_interval_count: 200
trace_log_path: "<target_log_directory>"
}

Run the solution (input_video_path and output_video_path parameters are optional): GLOG_logtostderr=1 bazel-bin/mediapipe/examples/desktop/hand_tracking/hand_tracking_cpu --calculator_graph_config_file=mediapipe/graphs/hand_tracking/hand_tracking_desktop_live.pbtxt --input_video_path=<input_video_file_path> --output_video_path=<output_video_file_path
Get the .binarypb file and upload it to visualizer, sort by avg time and find the slowest nodes - they will be inference calculators for the palm_detection and hand_landmarks models.
Download TFLite benchmark tool from here. Measure performance of the models with the following commands:
- --graph=/mediapipe/modules/palm_detection/palm_detection_full.tflite --num_threads=3 --use_xnnpack=true --num_runs=2000
- --graph=/mediapipe/modules/hand_landmark/hand_landmark_full.tflite --num_threads=3 --use_xnnpack=true --num_runs=2000

(Look for a line in the output that looks like: Inference timings in us: Init: 80256, First inference: 62178, Warmup (avg): 60112.9, Inference (avg): 60287.1)

Compare the models inference time.
???

Other info / Complete Logs :

My average inference time results:

Model	tflite benchmark	mediapipe profiler
palm_detection	75 ms	133 ms
hand_landmark	60 ms	110 ms

My mediapipe profiler log file: link to download

My tflite benchmark tool result:

palm_detection_full.tflite model


STARTING!
Log parameter values verbosely: [0]
Min num runs: [2000]
Num threads: [3]
Graph: [/home/bg/tmp/mediapipe/mediapipe/modules/palm_detection/palm_detection_full.tflite]
#threads used for CPU inference: [3]
Use xnnpack: [1]
Loaded model /home/bg/tmp/mediapipe/mediapipe/modules/palm_detection/palm_detection_full.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
XNNPACK delegate created.
Explicitly applied XNNPACK delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
The input model file size (MB): 2.34128
Initialized session in 112.979ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=7 first=92201 curr=83114 min=73107 max=92201 avg=79946 std=6468

Running benchmark for at least 2000 iterations and at least 1 seconds but terminate if exceeding 150 seconds. count=1986 first=73574 curr=67474 min=65677 max=237263 avg=75151.7 std=16578

Inference timings in us: Init: 112979, First inference: 92201, Warmup (avg): 79946, Inference (avg): 75151.7 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. Memory footprint delta from the start of the tool (MB): init=15.543 overall=42.1562

- hand_landmark_full.tflite model

STARTING! Log parameter values verbosely: [0] Min num runs: [2000] Num threads: [3] Graph: [/home/bg/tmp/mediapipe/mediapipe/modules/hand_landmark/hand_landmark_full.tflite]

threads used for CPU inference: [3]

Use xnnpack: [1] Loaded model /home/bg/tmp/mediapipe/mediapipe/modules/hand_landmark/hand_landmark_full.tflite INFO: Created TensorFlow Lite XNNPACK delegate for CPU. XNNPACK delegate created. Explicitly applied XNNPACK delegate, and the model graph will be completely executed by the delegate. The input model file size (MB): 5.47869 Initialized session in 80.256ms. Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. count=9 first=62178 curr=58597 min=58450 max=62720 avg=60112.9 std=1474

Running benchmark for at least 2000 iterations and at least 1 seconds but terminate if exceeding 150 seconds. count=2000 first=192849 curr=58375 min=50373 max=253885 avg=60287.1 std=15568

Inference timings in us: Init: 80256, First inference: 62178, Warmup (avg): 60112.9, Inference (avg): 60287.1 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. Memory footprint delta from the start of the tool (MB): init=30.5781 overall=37.4414

sureshdagooglecom commented 2 years ago

@orsveri ,are you using same data for both the models?

orsveri commented 2 years ago

@sureshdagooglecom , yes. I am using a short videofile. I uploaded it here, if you want to try.

SunXuan90 commented 2 years ago

I have the same issue on android. I ported movnet thunder, and the inference time is doubled compared to official demo app.

kuaashish commented 1 year ago

Hello @orsveri, We are upgrading the MediaPipe Legacy Solutions to new MediaPipe solutions However, the libraries, documentation, and source code for all the MediapPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.

You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions which can help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions.

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 year ago

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

google-ai-edge / mediapipe

.tflite model is slower when run in mediapipe graph #3160

threads used for CPU inference: [3]