Closed justsonghua closed 5 months ago
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
Yes
OS Platform and Distribution
Window11 23H2 22631.3527
MediaPipe Tasks SDK version
No response
Task name (e.g. Image classification, Gesture recognition etc.)
Gesture recognition
Programming Language and version (e.g. C++, Python, Java)
python
Describe the actual behavior
Mediapipe should correctly load and use the gesture recognizer model from the specified path, regardless of special characters in the directory name.
Describe the expected behaviour
Mediapipe try to load the gesture_recognizer.task from the conda virtual env folder.
Standalone code/steps you may have used to try to get what you need
import cv2 import mediapipe as mp from mediapipe.tasks import python from mediapipe.tasks.python import vision import time import os from pathlib import Path # Set model directory and change working directory model_dir = Path(r"D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos") os.chdir(model_dir) # Set model path print("Current working directory:", os.getcwd()) model_path = Path("gesture_recognizer.task") # Get absolute path and check if the file exists absolute_model_path = os.path.abspath(model_path) if not os.path.exists(absolute_model_path): print("Model file does not exist:", absolute_model_path) else: print("Model file found:", absolute_model_path) # Initialize hand detection mp_hands = mp.solutions.hands mp_drawing = mp.solutions.drawing_utils hands = mp_hands.Hands( static_image_mode=False, max_num_hands=2, min_detection_confidence=0.75, min_tracking_confidence=0.5 ) # Define gesture recognition callback def gesture_result_callback(result, image, timestamp): if result is not None and result.gestures: print('Gesture recognized:', result.gestures) cv2.putText(image, f'Gesture: {result.gestures}', (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA) # Print the absolute_model_path print("Using model path:", absolute_model_path) # Initialize gesture recognizer base_options = python.BaseOptions(model_asset_path=absolute_model_path) options = vision.GestureRecognizerOptions(base_options=base_options, running_mode=vision.RunningMode.LIVE_STREAM, result_callback=gesture_result_callback) recognizer = vision.GestureRecognizer.create_from_options(options) # Initialize webcam cap = cv2.VideoCapture(0) while cap.isOpened(): ret, frame = cap.read() if not ret: print("Ignoring empty frame") break frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) frame_rgb = cv2.flip(frame_rgb, 1) results = hands.process(frame_rgb) recognizer.recognize_async(frame_rgb, int(time.time() * 1000)) frame = cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR) if results.multi_hand_landmarks: for hand_landmarks in results.multi_hand_landmarks: mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS) cv2.imshow("MediaPipe Hands and Gesture Recognition", frame) if cv2.waitKey(5) & 0xFF == 27: break cap.release() cv2.destroyAllWindows()
Other info / Complete Logs
Current working directory: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos Model file found: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task Using model path: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task Traceback (most recent call last): File "D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\demo_002.py", line 61, in <module> recognizer = vision.GestureRecognizer.create_from_options(options) File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\gesture_recognizer.py", line 340, in create_from_options return cls( File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\core\base_vision_task_api.py", line 70, in __init__ self._runner = _TaskRunner.create(graph_config, packet_callback) RuntimeError: Unable to open file at C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages/D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task, errno=22 INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
I suspect the issue might be due to the
%
character in my file path, but I don't understand why the problem still exists even after I set the absolute path. The path remains unchanged until it's passed intopython.BaseOptions()
, but after that, it suddenly switches to the conda virtual environment's folder.
I tried creating a new path and moving the file into this folder, as you can see, I removed the %
from the file path,
Current working directory: D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos
Model file found: D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Using model path: D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
But I'm still getting the same error message:
Traceback (most recent call last):
File "D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\demo_002.py", line 61, in <module>
recognizer = vision.GestureRecognizer.create_from_options(options)
File "C:_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\gesture_recognizer.py", line 340, in create_from_options
return cls(
File "C:_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\core\base_vision_task_api.py", line 70, in init
self._runner = _TaskRunner.create(graph_config, packet_callback)
RuntimeError: Unable to open file at C:_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages/D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task, errno=22
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
It seems that this issue isn't related to the %
in the original file path, but I still don't know why it throws an error during runtime.
Hi @justsonghua,
It appears you are using our legacy hand solution based on the provided code. This solution has been upgraded and is now part of the new Gesture Recognition Task API. Support for the legacy hand solution has ended. Please try our new Task API for the updated Python example available here. For a general overview, visit our overview page.
Apart from this we can not do much about this issue, If you encounter any issues with new Task API, please report them here for further assistance.
Thank you!!
# Created by Songhua at 14.May.2024
import cv2
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time
import numpy as np
# Initialize Mediapipe modules
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
# Initialize gesture recognizer
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerResult = mp.tasks.vision.GestureRecognizerResult
VisionRunningMode = mp.tasks.vision.RunningMode
# Initialize variables
current_frame = None
gesture_text = "None"
current_result = None
# Function to update gesture text
# 1 Hand Only
def update_gesture_text(result: GestureRecognizerResult, output_image: mp.Image, timestamp_ms: int):
global gesture_text, current_result
if result is not None and result.gestures:
gesture_text = result.gestures[0][0].category_name
else:
gesture_text = "None"
current_result = result
# Function to display results on the frame
def display_result(frame):
global gesture_text
cv2.putText(frame, gesture_text, (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1, cv2.LINE_AA)
# Function to draw bounding box on the frame
def draw_bounding_box(frame, result: GestureRecognizerResult):
if result is not None and result.hand_landmarks:
for hand_landmarks in result.hand_landmarks:
x_coords = [landmark.x * frame.shape[1] for landmark in hand_landmarks]
y_coords = [landmark.y * frame.shape[0] for landmark in hand_landmarks]
x_min, x_max = int(min(x_coords)), int(max(x_coords))
y_min, y_max = int(min(y_coords)), int(max(y_coords))
cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
# Configuration for gesture recognizer
model_path = 'D:/DokiDoki/M.Sc._EAAS/HiWi.Job/Projects/wode.demos/gesture_recognizer.task'
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.GestureRecognizerOptions(
base_options=base_options,
running_mode=VisionRunningMode.LIVE_STREAM,
result_callback=update_gesture_text
)
recognizer = vision.GestureRecognizer.create_from_options(options)
# Initialize webcam
# for index in range(3):
# cap = cv2.VideoCapture(index)
# if cap.isOpened():
# print(f"Camera index {index} is available")
# cap.release()
camera_index = 2 # Initialize webcam index
cap = cv2.VideoCapture(camera_index)
timestamp = 0
while cap.isOpened():
# Capture frame-by-frame
ret, frame = cap.read()
if not ret:
print("Ignoring empty frame")
break
timestamp += 1
# Flip the frame horizontally for a mirrored view
frame = cv2.flip(frame, 1)
# Convert the frame to mp.Image format
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)
# Send live image data to perform gesture recognition
recognizer.recognize_async(mp_image, timestamp)
# Display the frame with recognition result
display_result(frame)
# Draw bounding box on the frame
draw_bounding_box(frame, current_result)
cv2.imshow("MediaPipe Model", frame)
# Exit on ESC key
if cv2.waitKey(5) & 0xFF == 27:
break
# Release the webcam resource
cap.release()
cv2.destroyAllWindows()
So, with the new api, it works now.
But a new problem, that it can only recognize one hand. Is this simple model (gesture_recognizer.task) can only one hand recognize?
# Created by Songhua at 14.May.2024 import cv2 import mediapipe as mp from mediapipe.framework.formats import landmark_pb2 from mediapipe.tasks import python from mediapipe.tasks.python import vision import time import numpy as np # Initialize Mediapipe modules mp_hands = mp.solutions.hands mp_drawing = mp.solutions.drawing_utils mp_drawing_styles = mp.solutions.drawing_styles # Initialize gesture recognizer GestureRecognizer = mp.tasks.vision.GestureRecognizer GestureRecognizerResult = mp.tasks.vision.GestureRecognizerResult VisionRunningMode = mp.tasks.vision.RunningMode # Initialize variables current_frame = None gesture_text = "None" current_result = None # Function to update gesture text # 1 Hand Only def update_gesture_text(result: GestureRecognizerResult, output_image: mp.Image, timestamp_ms: int): global gesture_text, current_result if result is not None and result.gestures: gesture_text = result.gestures[0][0].category_name else: gesture_text = "None" current_result = result # Function to display results on the frame def display_result(frame): global gesture_text cv2.putText(frame, gesture_text, (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1, cv2.LINE_AA) # Function to draw bounding box on the frame def draw_bounding_box(frame, result: GestureRecognizerResult): if result is not None and result.hand_landmarks: for hand_landmarks in result.hand_landmarks: x_coords = [landmark.x * frame.shape[1] for landmark in hand_landmarks] y_coords = [landmark.y * frame.shape[0] for landmark in hand_landmarks] x_min, x_max = int(min(x_coords)), int(max(x_coords)) y_min, y_max = int(min(y_coords)), int(max(y_coords)) cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2) # Configuration for gesture recognizer model_path = 'D:/DokiDoki/M.Sc._EAAS/HiWi.Job/Projects/wode.demos/gesture_recognizer.task' base_options = python.BaseOptions(model_asset_path=model_path) options = vision.GestureRecognizerOptions( base_options=base_options, running_mode=VisionRunningMode.LIVE_STREAM, result_callback=update_gesture_text ) recognizer = vision.GestureRecognizer.create_from_options(options) # Initialize webcam # for index in range(3): # cap = cv2.VideoCapture(index) # if cap.isOpened(): # print(f"Camera index {index} is available") # cap.release() camera_index = 2 # Initialize webcam index cap = cv2.VideoCapture(camera_index) timestamp = 0 while cap.isOpened(): # Capture frame-by-frame ret, frame = cap.read() if not ret: print("Ignoring empty frame") break timestamp += 1 # Flip the frame horizontally for a mirrored view frame = cv2.flip(frame, 1) # Convert the frame to mp.Image format mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame) # Send live image data to perform gesture recognition recognizer.recognize_async(mp_image, timestamp) # Display the frame with recognition result display_result(frame) # Draw bounding box on the frame draw_bounding_box(frame, current_result) cv2.imshow("MediaPipe Model", frame) # Exit on ESC key if cv2.waitKey(5) & 0xFF == 27: break # Release the webcam resource cap.release() cv2.destroyAllWindows()
So, with the new api, it works now.
But a new problem, that it can only recognize one hand. Is this simple model (gesture_recognizer.task) can only one hand recognize?
I found this demo, and now my codes can recognize both hands now.
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
Yes
OS Platform and Distribution
Window11 23H2 22631.3527
MediaPipe Tasks SDK version
No response
Task name (e.g. Image classification, Gesture recognition etc.)
Gesture recognition
Programming Language and version (e.g. C++, Python, Java)
python
Describe the actual behavior
Mediapipe should correctly load and use the gesture recognizer model from the specified path, regardless of special characters in the directory name.
Describe the expected behaviour
Mediapipe try to load the gesture_recognizer.task from the conda virtual env folder.
Standalone code/steps you may have used to try to get what you need
Other info / Complete Logs
I suspect the issue might be due to the
%
character in my file path, but I don't understand why the problem still exists even after I set the absolute path. The path remains unchanged until it's passed intopython.BaseOptions()
, but after that, it suddenly switches to the conda virtual environment's folder.