google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.76k stars 5.18k forks source link

MediaPipe Python Livestream Hand Gesture Detection #5231

Closed rohandhiman03 closed 7 months ago

rohandhiman03 commented 8 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

None

OS Platform and Distribution

Python

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

Gesture Recognition

Programming Language and version (e.g. C++, Python, Java)

Python

Describe the actual behavior

Hangs when trying to get the category name

Describe the expected behaviour

I want to get the category name and display it on the frame

Standalone code/steps you may have used to try to get what you need

import cv2
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time
import numpy as np

mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerResult = mp.tasks.vision.GestureRecognizerResult
VisionRunningMode = mp.tasks.vision.RunningMode

def print_result(result: GestureRecognizerResult, output_image: mp.Image, timestamp_ms: int):
    # Check if the result is not None before printing
    if result is not None:
            print('gesture recognition result: {}'.format(result.gestures))
    else:
        # If no gesture is recognized, print a default message
        cv2.putText(output_image, 'No gesture recognized', (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2, cv2.LINE_AA)

base_options = python.BaseOptions(model_asset_path='C:/Users/rohan/OneDrive/Desktop/HCI Project/gesture_recognizer.task')
options = vision.GestureRecognizerOptions(base_options=base_options,running_mode=VisionRunningMode.LIVE_STREAM,result_callback=print_result)
recognizer = vision.GestureRecognizer.create_from_options(options)
frame_timestamp_ms = int(round(time.time() * 1000))
# Initialize the webcam
cap = cv2.VideoCapture(0)

timestamp = 0

with GestureRecognizer.create_from_options(options) as recognizer:
    while cap.isOpened(): 
        # Capture frame-by-frame
        ret, frame = cap.read()

        if not ret:
            print("Ignoring empty frame")
            break

        timestamp += 1
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)
        # Convert the frame to a format that the recognizer can process
        # (Note: You might need to adjust this part depending on how your recognizer expects the data)

        # Send live image data to perform gesture recognition
        recognizer.recognize_async(mp_image, timestamp)

        cv2.imshow("MediaPipe Model", frame)

        if cv2.waitKey(5) & 0xFF == 27:
            break

# Release the webcam and destroy all OpenCV windows
cap.release()
cv2.destroyAllWindows()

Other info / Complete Logs

Above is the working code when I try 
cv2.putText(output_image, result.gestures.Category.category_name , (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2, cv2.LINE_AA)

or other accessing methods it hangs. It hangs even for printing the category name.
kuaashish commented 8 months ago

Hi @rohandhiman03,

When I tried the code on my macOS, it worked fine without any problems. We might need to test it on a Windows system to understand the issue better.

Thank you!!

github-actions[bot] commented 7 months ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 7 months ago

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler[bot] commented 7 months ago

Are you satisfied with the resolution of your issue? Yes No