ERROR: Node number 712 (TfLiteGpuDelegate) failed to prepare.while running face_mesh example with gpu enabled

AndresArtavia02 commented 1 year ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

None

OS Platform and Distribution

Jetpack 4.6.4/Ubuntu 18.04

MediaPipe Tasks SDK version

commit 9474394768ea77ef58280f7114f0fd0e934a62ef(master)

Task name (e.g. Image classification, Gesture recognition etc.)

face_mesh

Programming Language and version (e.g. C++, Python, Java)

c++

Describe the actual behavior

The example fails with the described error

Describe the expected behaviour

To run the example using gpu

Standalone code/steps you may have used to try to get what you need

Hi, Im working on a Jetson TX2 NX, working directly with display and keyboard.
Using bazel 6.1.1
Built cv using the included script but changed the cmake flags to add CUDA support. It gets detected fine(3.4.10-dev w CUDA).
To compile face_mesh:
bazel build --define no_aws_support=true --copt -DEGL_NO_X11 --copt -DMESA_EGL_NO_X11_HEADERS   mediapipe/examples/desktop/face_mesh:face_mesh_gpu
To compile face_detect to get tflite model:
bazel build --define no_aws_support=true --copt -DEGL_NO_X11 --copt -DMESA_EGL_NO_X11_HEADERS   mediapipe/examples/desktop/face_detection:face_detection_gpu
Then to run it:
./mediapipe/examples/desktop/face_mesh/face_mesh_gpu --calculator_graph_config_file /tmp/face_mesh_desktop_live_gpu.pbtxt --input_video_path /tmp/sample.mp4 --output_video_path /tmp/test.mp4

Already tried to set use_advanced_gpu_api: false, throws the same error.

Other info / Complete Logs

Here is what I changed
'''
diff --git a/WORKSPACE b/WORKSPACE
index 3a539569..f07e8aa3 100644
--- a/WORKSPACE
+++ b/WORKSPACE
@@ -340,7 +340,7 @@ http_archive(
 new_local_repository(
     name = "linux_opencv",
     build_file = "@//third_party:opencv_linux.BUILD",
-    path = "/usr",
+    path = "/usr/local",
 )

 new_local_repository(
diff --git a/mediapipe/modules/face_detection/face_detection_full_range_gpu.pbtxt b/mediapipe/modules/face_detection/face_detection_full_range_gpu.pbtxt
index 52b6e361..28fce1cd 100644
--- a/mediapipe/modules/face_detection/face_detection_full_range_gpu.pbtxt
+++ b/mediapipe/modules/face_detection/face_detection_full_range_gpu.pbtxt
@@ -19,7 +19,7 @@ node {
   node_options: {
     [type.googleapis.com/mediapipe.FaceDetectionOptions] {
       gpu_origin: TOP_LEFT
-      delegate: { gpu { use_advanced_gpu_api: true } }
+      delegate: { gpu { use_advanced_gpu_api: false } }
     }
   }
   option_value: "OPTIONS:options"
diff --git a/mediapipe/modules/face_detection/face_detection_short_range_by_roi_gpu.pbtxt b/mediapipe/modules/face_detection/face_detection_short_range_by_roi_gpu.pbtxt
index 6f9e9e98..dc727efc 100644
--- a/mediapipe/modules/face_detection/face_detection_short_range_by_roi_gpu.pbtxt
+++ b/mediapipe/modules/face_detection/face_detection_short_range_by_roi_gpu.pbtxt
@@ -24,7 +24,7 @@ node {
   node_options: {
     [type.googleapis.com/mediapipe.FaceDetectionOptions] {
       gpu_origin: TOP_LEFT
-      delegate: { gpu { use_advanced_gpu_api: true } }
+      delegate: { gpu { use_advanced_gpu_api: false } }
     }
   }
   option_value: "OPTIONS:options"
diff --git a/mediapipe/modules/face_detection/face_detection_short_range_gpu.pbtxt b/mediapipe/modules/face_detection/face_detection_short_range_gpu.pbtxt
index ededa135..17932098 100644
--- a/mediapipe/modules/face_detection/face_detection_short_range_gpu.pbtxt
+++ b/mediapipe/modules/face_detection/face_detection_short_range_gpu.pbtxt
@@ -19,7 +19,7 @@ node {
   node_options: {
     [type.googleapis.com/mediapipe.FaceDetectionOptions] {
       gpu_origin: TOP_LEFT
-      delegate: { gpu { use_advanced_gpu_api: true } }
+      delegate: { gpu { use_advanced_gpu_api: false } }
     }
   }
   option_value: "OPTIONS:options"
diff --git a/mediapipe/modules/pose_detection/pose_detection_gpu.pbtxt b/mediapipe/modules/pose_detection/pose_detection_gpu.pbtxt
index b95a1176..2bc3ce8a 100644
--- a/mediapipe/modules/pose_detection/pose_detection_gpu.pbtxt
+++ b/mediapipe/modules/pose_detection/pose_detection_gpu.pbtxt
@@ -70,7 +70,7 @@ node {
     [mediapipe.InferenceCalculatorOptions.ext] {
       model_path: "mediapipe/modules/pose_detection/pose_detection.tflite"
       #
-      delegate: { gpu { use_advanced_gpu_api: true } }
+      delegate: { gpu { use_advanced_gpu_api: false } }
     }
   }
 }
diff --git a/setup_opencv.sh b/setup_opencv.sh
index b055b3a8..e037dea1 100644
--- a/setup_opencv.sh
+++ b/setup_opencv.sh
@@ -73,8 +73,10 @@ if [ -z "$1" ]
           -DBUILD_opencv_structured_light=OFF -DBUILD_opencv_surface_matching=OFF \
           -DBUILD_opencv_world=OFF -DBUILD_opencv_xobjdetect=OFF -DBUILD_opencv_xphoto=OFF \
           -DCV_ENABLE_INTRINSICS=ON -DWITH_EIGEN=ON -DWITH_PTHREADS=ON -DWITH_PTHREADS_PF=ON \
-          -DWITH_JPEG=ON -DWITH_PNG=ON -DWITH_TIFF=ON
-    make -j 16
+          -DWITH_JPEG=ON -DWITH_PNG=ON -DWITH_TIFF=ON \
+          -DCUDA_ARCH_BIN=6.2 -DCUDA_ARCH_PTX= -DCUDA_FAST_MATH=ON -DCUDNN_VERSION='8.0' \
+          -DEIGEN_INCLUDE_PATH=/usr/include/eigen3 -DENABLE_NEON=ON -DOPENCV_DNN_CUDA=ON -DWITH_CUDA=ON -DWITH_CUDNN=ON 
+    make -j6
     sudo make install
     rm -rf /tmp/build_opencv
     echo "OpenCV has been built. You can find the header files and libraries in /usr/local/include/opencv2/ and /usr/local/lib"
diff --git a/third_party/opencv_linux.BUILD b/third_party/opencv_linux.BUILD
index 84458554..6ca91a0c 100644
--- a/third_party/opencv_linux.BUILD
+++ b/third_party/opencv_linux.BUILD
@@ -28,6 +28,7 @@ cc_library(
         #"include/opencv4/",
     ],
     linkopts = [
+        "-L/usr/local/lib",
         "-l:libopencv_core.so",
         "-l:libopencv_calib3d.so",
         "-l:libopencv_features2d.so",

'''
The full error:
'''
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1698959878.296767    4267 demo_run_graph_main_gpu.cc:54] Get calculator graph config contents: # MediaPipe graph that performs face mesh with TensorFlow Lite on GPU.

# Input image. (GpuBuffer)
input_stream: "input_video"

# Output image with rendered results. (GpuBuffer)
output_stream: "output_video"
# Collection of detected/processed faces, each represented as a list of
# landmarks. (std::vector<NormalizedLandmarkList>)
output_stream: "multi_face_landmarks"

# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for downstream nodes
# (calculators and subgraphs) in the graph to finish their tasks before it
# passes through another image. All images that come in while waiting are
# dropped, limiting the number of in-flight images in most part of the graph to
# 1. This prevents the downstream nodes from queuing up incoming images and data
# excessively, which leads to increased latency and memory usage, unwanted in
# real-time mobile applications. It also eliminates unnecessarily computation,
# e.g., the output produced by a node may get dropped downstream if the
# subsequent nodes are still busy processing previous inputs.
node {
  calculator: "FlowLimiterCalculator"
  input_stream: "input_video"
  input_stream: "FINISHED:output_video"
  input_stream_info: {
    tag_index: "FINISHED"
    back_edge: true
  }
  output_stream: "throttled_input_video"
}

# Defines side packets for further use in the graph.
node {
  calculator: "ConstantSidePacketCalculator"
  output_side_packet: "PACKET:0:num_faces"
  output_side_packet: "PACKET:1:with_attention"
  node_options: {
    [type.googleapis.com/mediapipe.ConstantSidePacketCalculatorOptions]: {
      packet { int_value: 1 }
      packet { bool_value: true }
    }
  }
}

# Subgraph that detects faces and corresponding landmarks.
node {
  calculator: "FaceLandmarkFrontGpu"
  input_stream: "IMAGE:throttled_input_video"
  input_side_packet: "NUM_FACES:num_faces"
  input_side_packet: "WITH_ATTENTION:with_attention"
  output_stream: "LANDMARKS:multi_face_landmarks"
  output_stream: "ROIS_FROM_LANDMARKS:face_rects_from_landmarks"
  output_stream: "DETECTIONS:face_detections"
  output_stream: "ROIS_FROM_DETECTIONS:face_rects_from_detections"
}

# Subgraph that renders face-landmark annotation onto the input image.
node {
  calculator: "FaceRendererGpu"
  input_stream: "IMAGE:throttled_input_video"
  input_stream: "LANDMARKS:multi_face_landmarks"
  input_stream: "NORM_RECTS:face_rects_from_landmarks"
  input_stream: "DETECTIONS:face_detections"
  output_stream: "IMAGE:output_video"
}

I0000 00:00:1698959878.307784    4267 demo_run_graph_main_gpu.cc:60] Initialize the calculator graph.
I0000 00:00:1698959878.402318    4267 demo_run_graph_main_gpu.cc:64] Initialize the GPU.
I0000 00:00:1698959878.453762    4267 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1698959878.510927    4276 gl_context.cc:344] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 32.7.4), renderer: NVIDIA Tegra X2 (nvgpu)/integrated
I0000 00:00:1698959878.512084    4267 demo_run_graph_main_gpu.cc:70] Initialize the camera or load the video.
[AVBSFContext @ 0x55ae3909b0] Invalid NAL unit 0, skipping.
I0000 00:00:1698959878.597643    4267 demo_run_graph_main_gpu.cc:91] Start running the calculator graph.
I0000 00:00:1698959878.608283    4267 demo_run_graph_main_gpu.cc:96] Start grabbing and processing frames.
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
VERBOSE: Replacing 164 out of 164 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions for the whole graph.
VERBOSE: Replacing 712 out of 712 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions for the whole graph.
ERROR: TfLiteGpuDelegate Prepare: Program is not properly linked: Compute info
------------
0(83) : error C1068: array index out of bounds
0(91) : error C1068: array index out of bounds
0(92) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(96) : error C1068: array index out of bounds
0(97) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(96) : error C1068: array index out of bounds
0(97) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(87) : error C1068: array index out of bounds
0(88) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(96) : error C1068: array index out of bounds
0(97) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(96) : error C1068: array index out of bounds
0(97) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(91) : error C1068: array index out of bounds
0(92) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(91) : error C1068: array index out of bounds
0(92) : error C1068: array index out of bounds
0(100) : error C1068: array index out of bounds
0(100) : error C1068: array index out of bounds
0(100) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(87) : error C1068: array index out of bounds
0(88) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(91) : error C1068: array index out of bounds
0(92) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(87) : error C1068: array index out of bounds
0(88) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(87) : error C1068: array index out of bounds
0(88) : error C1068: array index out of bounds
0(100) : error C1068: array index out of bounds
0(100) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(87) : error C1068: array index out of bounds
0(88) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(91) : error C1068: array index out of bounds
0(92) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(91) : error C1068: array index out of bounds
0(92) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(87) : error C1068: array index out of bounds
0(88) : error C1068: array index out of bounds
0(100) : error C1068: array index out of bounds
0(100) : error C1068: array index out of bounds
0(100) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(87) : error C1068: array index out of bounds
0(88) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(87) : error C1068: array index out of bounds
0(88) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(96) : error C1068: array index out of bounds
0(97) : error C1068: array index out of bounds
0(100) : error C1068: array index out of bounds
0(83) : error C1068: array index out of bounds
0(87) : error C1068: array index out of bounds
0(88) : error C1068: array index out of bounds

ERROR: Node number 712 (TfLiteGpuDelegate) failed to prepare.
ERROR: Restored original execution plan after delegate application failure.
I0000 00:00:1698959879.810623    4267 demo_run_graph_main_gpu.cc:188] Shutting down.
E0000 00:00:1698959879.902863    4267 demo_run_graph_main_gpu.cc:199] Failed to run the graph: CalculatorGraph::Run() failed: 
Calculator::Open() for node "facelandmarkfrontgpu__facelandmarkgpu__inferencecalculator__facelandmarkfrontgpu__facelandmarkgpu__InferenceCalculator" failed: ; RET_CHECK failure (mediapipe/calculators/tensor/inference_calculator_gl.cc:195) (interpreter_->ModifyGraphWithDelegate(delegate_.get()))==(kTfLiteOk)
'''

kuaashish commented 1 year ago

Hi @AndresArtavia02,

We have not yet officially added support for the Jeston Nano. Currently, the only supported edge device is the 64-bit Raspberry Pi, as indicated here. However, there is a community plugin available in Python at this GitHub repository, which is up-to-date and can assist you in your current context. Unfortunately, beyond this community plugin, we are limited in our ability to provide extensive Jeston-related support at this time.

Nevertheless, it is important to note that Jeston support is included in our roadmap, and our team is actively working on its implementation. Regrettably, we are unable to specify a precise timeline for its availability at this juncture.

Thank you

AndresArtavia02 commented 1 year ago

@kuaashish thanks for the reply. Yeah that repo works but I can only use face mesh without iris detection, by setting refine_landmarks=False. Is there a way to get the original tensorflow model, in order to try and migrate it manually to jetson? I tried tflite2onnx but seems like it has some unsopported ops: raise NotImplementedError("Unsupported TFLite OP: {} {}!".format(opcode, name)) NotImplementedError: Unsupported TFLite OP: 32 CUSTOM!

adnan6336 commented 1 year ago

Hello people, I have been trying to get this (facelandmark refine_landmark=True) works on GPU on jetson nano but its been two years but no working method. I have tried it on CPU and it works but not on GPU. I have also raised issues but they are closed but the there is no solution for it. I know there is some unsupported Custom Ops problem that is beyond my skills so I am really looking forward this issue to be solved. Thank all the contributors.

https://github.com/google/mediapipe/issues/2678

AndresArtavia02 commented 1 year ago

@kuaashish Hi, after a digging more into tflite, tensorflow's and mediapipe's code. I don't see a way to use CUDA, if tflite is used. Is there a way to run facemesh with tensorflow using mediapipe? instead of tflite

google-ai-edge / mediapipe