google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.59k stars 5.16k forks source link

Path Handling Issue with Gesture Recognizer Model in Mediapipe #5400

Closed justsonghua closed 5 months ago

justsonghua commented 6 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Window11 23H2 22631.3527

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

Gesture recognition

Programming Language and version (e.g. C++, Python, Java)

python

Describe the actual behavior

Mediapipe should correctly load and use the gesture recognizer model from the specified path, regardless of special characters in the directory name.

Describe the expected behaviour

Mediapipe try to load the gesture_recognizer.task from the conda virtual env folder.

Standalone code/steps you may have used to try to get what you need

import cv2
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time

import os
from pathlib import Path

# Set model directory and change working directory
model_dir = Path(r"D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos")
os.chdir(model_dir)

# Set model path
print("Current working directory:", os.getcwd())
model_path = Path("gesture_recognizer.task")

# Get absolute path and check if the file exists
absolute_model_path = os.path.abspath(model_path)

if not os.path.exists(absolute_model_path):
    print("Model file does not exist:", absolute_model_path)
else:
    print("Model file found:", absolute_model_path)

# Initialize hand detection
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=2,
    min_detection_confidence=0.75,
    min_tracking_confidence=0.5
)

# Define gesture recognition callback
def gesture_result_callback(result, image, timestamp):
    if result is not None and result.gestures:
        print('Gesture recognized:', result.gestures)
        cv2.putText(image, f'Gesture: {result.gestures}', (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)

# Print the absolute_model_path
print("Using model path:", absolute_model_path)

# Initialize gesture recognizer
base_options = python.BaseOptions(model_asset_path=absolute_model_path)
options = vision.GestureRecognizerOptions(base_options=base_options, running_mode=vision.RunningMode.LIVE_STREAM, result_callback=gesture_result_callback)
recognizer = vision.GestureRecognizer.create_from_options(options)

# Initialize webcam
cap = cv2.VideoCapture(0)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("Ignoring empty frame")
        break

    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame_rgb = cv2.flip(frame_rgb, 1)
    results = hands.process(frame_rgb)

    recognizer.recognize_async(frame_rgb, int(time.time() * 1000))

    frame = cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR)

    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)

    cv2.imshow("MediaPipe Hands and Gesture Recognition", frame)

    if cv2.waitKey(5) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()

Other info / Complete Logs

Current working directory: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos
Model file found: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Using model path: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Traceback (most recent call last):
  File "D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\demo_002.py", line 61, in <module>
    recognizer = vision.GestureRecognizer.create_from_options(options)
  File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\gesture_recognizer.py", line 340, in create_from_options
    return cls(
  File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\core\base_vision_task_api.py", line 70, in __init__
    self._runner = _TaskRunner.create(graph_config, packet_callback)
RuntimeError: Unable to open file at C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages/D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task, errno=22
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

I suspect the issue might be due to the % character in my file path, but I don't understand why the problem still exists even after I set the absolute path. The path remains unchanged until it's passed into python.BaseOptions(), but after that, it suddenly switches to the conda virtual environment's folder.

justsonghua commented 6 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Window11 23H2 22631.3527

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

Gesture recognition

Programming Language and version (e.g. C++, Python, Java)

python

Describe the actual behavior

Mediapipe should correctly load and use the gesture recognizer model from the specified path, regardless of special characters in the directory name.

Describe the expected behaviour

Mediapipe try to load the gesture_recognizer.task from the conda virtual env folder.

Standalone code/steps you may have used to try to get what you need

import cv2
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time

import os
from pathlib import Path

# Set model directory and change working directory
model_dir = Path(r"D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos")
os.chdir(model_dir)

# Set model path
print("Current working directory:", os.getcwd())
model_path = Path("gesture_recognizer.task")

# Get absolute path and check if the file exists
absolute_model_path = os.path.abspath(model_path)

if not os.path.exists(absolute_model_path):
    print("Model file does not exist:", absolute_model_path)
else:
    print("Model file found:", absolute_model_path)

# Initialize hand detection
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=2,
    min_detection_confidence=0.75,
    min_tracking_confidence=0.5
)

# Define gesture recognition callback
def gesture_result_callback(result, image, timestamp):
    if result is not None and result.gestures:
        print('Gesture recognized:', result.gestures)
        cv2.putText(image, f'Gesture: {result.gestures}', (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)

# Print the absolute_model_path
print("Using model path:", absolute_model_path)

# Initialize gesture recognizer
base_options = python.BaseOptions(model_asset_path=absolute_model_path)
options = vision.GestureRecognizerOptions(base_options=base_options, running_mode=vision.RunningMode.LIVE_STREAM, result_callback=gesture_result_callback)
recognizer = vision.GestureRecognizer.create_from_options(options)

# Initialize webcam
cap = cv2.VideoCapture(0)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("Ignoring empty frame")
        break

    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame_rgb = cv2.flip(frame_rgb, 1)
    results = hands.process(frame_rgb)

    recognizer.recognize_async(frame_rgb, int(time.time() * 1000))

    frame = cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR)

    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)

    cv2.imshow("MediaPipe Hands and Gesture Recognition", frame)

    if cv2.waitKey(5) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()

Other info / Complete Logs

Current working directory: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos
Model file found: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Using model path: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Traceback (most recent call last):
  File "D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\demo_002.py", line 61, in <module>
    recognizer = vision.GestureRecognizer.create_from_options(options)
  File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\gesture_recognizer.py", line 340, in create_from_options
    return cls(
  File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\core\base_vision_task_api.py", line 70, in __init__
    self._runner = _TaskRunner.create(graph_config, packet_callback)
RuntimeError: Unable to open file at C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages/D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task, errno=22
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

I suspect the issue might be due to the % character in my file path, but I don't understand why the problem still exists even after I set the absolute path. The path remains unchanged until it's passed into python.BaseOptions(), but after that, it suddenly switches to the conda virtual environment's folder.

I tried creating a new path and moving the file into this folder, as you can see, I removed the % from the file path,

Current working directory: D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos
Model file found: D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Using model path: D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task

But I'm still getting the same error message:

Traceback (most recent call last):
File "D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\demo_002.py", line 61, in <module>
recognizer = vision.GestureRecognizer.create_from_options(options)
File "C:_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\gesture_recognizer.py", line 340, in create_from_options
return cls(
File "C:_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\core\base_vision_task_api.py", line 70, in init
self._runner = _TaskRunner.create(graph_config, packet_callback)
RuntimeError: Unable to open file at C:_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages/D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task, errno=22
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

It seems that this issue isn't related to the % in the original file path, but I still don't know why it throws an error during runtime.

kuaashish commented 6 months ago

Hi @justsonghua,

It appears you are using our legacy hand solution based on the provided code. This solution has been upgraded and is now part of the new Gesture Recognition Task API. Support for the legacy hand solution has ended. Please try our new Task API for the updated Python example available here. For a general overview, visit our overview page.

Apart from this we can not do much about this issue, If you encounter any issues with new Task API, please report them here for further assistance.

Thank you!!

justsonghua commented 5 months ago
# Created by Songhua at 14.May.2024

import cv2
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time
import numpy as np

# Initialize Mediapipe modules
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

# Initialize gesture recognizer
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerResult = mp.tasks.vision.GestureRecognizerResult
VisionRunningMode = mp.tasks.vision.RunningMode

# Initialize variables
current_frame = None
gesture_text = "None"
current_result = None

# Function to update gesture text

# 1 Hand Only
def update_gesture_text(result: GestureRecognizerResult, output_image: mp.Image, timestamp_ms: int):
    global gesture_text, current_result
    if result is not None and result.gestures:
        gesture_text = result.gestures[0][0].category_name
    else:
        gesture_text = "None"
    current_result = result

# Function to display results on the frame
def display_result(frame):
    global gesture_text
    cv2.putText(frame, gesture_text, (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1, cv2.LINE_AA)

# Function to draw bounding box on the frame
def draw_bounding_box(frame, result: GestureRecognizerResult):
    if result is not None and result.hand_landmarks:
        for hand_landmarks in result.hand_landmarks:
            x_coords = [landmark.x * frame.shape[1] for landmark in hand_landmarks]
            y_coords = [landmark.y * frame.shape[0] for landmark in hand_landmarks]
            x_min, x_max = int(min(x_coords)), int(max(x_coords))
            y_min, y_max = int(min(y_coords)), int(max(y_coords))
            cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)

# Configuration for gesture recognizer
model_path = 'D:/DokiDoki/M.Sc._EAAS/HiWi.Job/Projects/wode.demos/gesture_recognizer.task'
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.GestureRecognizerOptions(
    base_options=base_options,
    running_mode=VisionRunningMode.LIVE_STREAM,
    result_callback=update_gesture_text
)
recognizer = vision.GestureRecognizer.create_from_options(options)

# Initialize webcam

# for index in range(3):
#     cap = cv2.VideoCapture(index)
#     if cap.isOpened():
#         print(f"Camera index {index} is available")
#         cap.release()

camera_index = 2  # Initialize webcam index
cap = cv2.VideoCapture(camera_index)

timestamp = 0

while cap.isOpened():
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        print("Ignoring empty frame")
        break

    timestamp += 1

    # Flip the frame horizontally for a mirrored view
    frame = cv2.flip(frame, 1)

    # Convert the frame to mp.Image format
    mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)

    # Send live image data to perform gesture recognition
    recognizer.recognize_async(mp_image, timestamp)

    # Display the frame with recognition result
    display_result(frame)

    # Draw bounding box on the frame
    draw_bounding_box(frame, current_result)

    cv2.imshow("MediaPipe Model", frame)

    # Exit on ESC key
    if cv2.waitKey(5) & 0xFF == 27:
        break

# Release the webcam resource
cap.release()
cv2.destroyAllWindows()

So, with the new api, it works now.

But a new problem, that it can only recognize one hand. Is this simple model (gesture_recognizer.task) can only one hand recognize?

justsonghua commented 5 months ago
# Created by Songhua at 14.May.2024

import cv2
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time
import numpy as np

# Initialize Mediapipe modules
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

# Initialize gesture recognizer
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerResult = mp.tasks.vision.GestureRecognizerResult
VisionRunningMode = mp.tasks.vision.RunningMode

# Initialize variables
current_frame = None
gesture_text = "None"
current_result = None

# Function to update gesture text

# 1 Hand Only
def update_gesture_text(result: GestureRecognizerResult, output_image: mp.Image, timestamp_ms: int):
    global gesture_text, current_result
    if result is not None and result.gestures:
        gesture_text = result.gestures[0][0].category_name
    else:
        gesture_text = "None"
    current_result = result

# Function to display results on the frame
def display_result(frame):
    global gesture_text
    cv2.putText(frame, gesture_text, (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1, cv2.LINE_AA)

# Function to draw bounding box on the frame
def draw_bounding_box(frame, result: GestureRecognizerResult):
    if result is not None and result.hand_landmarks:
        for hand_landmarks in result.hand_landmarks:
            x_coords = [landmark.x * frame.shape[1] for landmark in hand_landmarks]
            y_coords = [landmark.y * frame.shape[0] for landmark in hand_landmarks]
            x_min, x_max = int(min(x_coords)), int(max(x_coords))
            y_min, y_max = int(min(y_coords)), int(max(y_coords))
            cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)

# Configuration for gesture recognizer
model_path = 'D:/DokiDoki/M.Sc._EAAS/HiWi.Job/Projects/wode.demos/gesture_recognizer.task'
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.GestureRecognizerOptions(
    base_options=base_options,
    running_mode=VisionRunningMode.LIVE_STREAM,
    result_callback=update_gesture_text
)
recognizer = vision.GestureRecognizer.create_from_options(options)

# Initialize webcam

# for index in range(3):
#     cap = cv2.VideoCapture(index)
#     if cap.isOpened():
#         print(f"Camera index {index} is available")
#         cap.release()

camera_index = 2  # Initialize webcam index
cap = cv2.VideoCapture(camera_index)

timestamp = 0

while cap.isOpened():
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        print("Ignoring empty frame")
        break

    timestamp += 1

    # Flip the frame horizontally for a mirrored view
    frame = cv2.flip(frame, 1)

    # Convert the frame to mp.Image format
    mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)

    # Send live image data to perform gesture recognition
    recognizer.recognize_async(mp_image, timestamp)

    # Display the frame with recognition result
    display_result(frame)

    # Draw bounding box on the frame
    draw_bounding_box(frame, current_result)

    cv2.imshow("MediaPipe Model", frame)

    # Exit on ESC key
    if cv2.waitKey(5) & 0xFF == 27:
        break

# Release the webcam resource
cap.release()
cv2.destroyAllWindows()

So, with the new api, it works now.

But a new problem, that it can only recognize one hand. Is this simple model (gesture_recognizer.task) can only one hand recognize?

https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/gesture_recognizer/raspberry_pi/recognize.py

I found this demo, and now my codes can recognize both hands now.