google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.77k stars 5.09k forks source link

Can you output the confidence of key points? Please let me know if you can. Thank you. #4947

Open lucker26 opened 10 months ago

lucker26 commented 10 months ago

This template is for miscellaneous issues not covered by the other issue categories

For questions on how to work with MediaPipe, or support for problems that are not verified bugs in MediaPipe, please go to StackOverflow and Slack communities.

If you are reporting a vulnerability, please use the dedicated reporting process.

kuaashish commented 10 months ago

Hi @lucker26,

Kindly provide more details regarding your inquiry. Additionally, we suggest filling out the template to ensure we have the necessary information for a more effective response to your support needs.

Thank you

lucker26 commented 10 months ago

The following is my custom code. I found that I can only output the coordinates of key points. If I want to output the confidence while outputting the coordinates of each key point, how can I modify my code or need to call other classes? import cv2 import mediapipe as mp import csv import h5py from datetime import datetime

mp_hands = mp.solutions.hands hands = mp_hands.Hands(static_image_mode=False, max_num_hands=4, min_detection_confidence=0.5, min_tracking_confidence=0.5)

mpDraw = mp.solutions.drawing_utils

def process_frame(img, csv_writer): img = cv2.flip(img, 1) img_RGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) results = hands.process(img_RGB) if results.multi_hand_landmarks: for hand_idx in range(len(results.multi_hand_landmarks)): hand_21 = results.multi_hand_landmarks[hand_idx] mpDraw.draw_landmarks(img, hand_21, mp_hands.HAND_CONNECTIONS, mp.solutions.drawing_styles.get_default_hand_landmarks_style(), mp.solutions.drawing_styles.get_default_hand_connections_style()) if csv_writer is not None: row = [datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")] for lm in hand_21.landmark: confidence = lm.presence row += [lm.ximg.shape[1], lm.yimg.shape[0]] print(confidence) csv_writer.writerow(row)

return img

csv_file = open('hand_poses.csv', mode='w', newline='') csv_writer = csv.writer(csv_file) headers = ['timestamp']

for i in range(21): headers += [f'x{i}', f'y{i}'] csv_writer.writerow(headers)

cv2.VideoCapture(-1).release() cap = cv2.VideoCapture(0) cap.open(0)

while cap.isOpened(): success, frame = cap.read() if not success: print('Error') break frame = process_frame(frame, csv_writer) cv2.imshow('my_window', frame) if cv2.waitKey(1) in [ord('q'), 27]: break

csv_file.close() cap.release() cv2.destroyAllWindows()

clogwog commented 10 months ago

For the Hands solution, there is a concept of "presence" and "visibility" for landmarks. According to the MediaPipe Hands documentation :

for the , "presence" indicates the likelihood of the landmark being present in the image, while "visibility" indicates the likelihood of the landmark being visible (not occluded) in the image. These can act as confidence scores for individual landmarks.

not sure if it will work though, as this is also still active: https://github.com/google/mediapipe/issues/3159

anyway, you can modify your code to include these confidence values for each landmark as follows:

Add headers for the presence and visibility confidence scores for each landmark to your CSV. Extract the presence and visibility values for each landmark and include them in the row you write to the CSV. Here's how you can modify your process_frame function and the headers for the CSV:

import cv2
import mediapipe as mp
import csv
from datetime import datetime

mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=False,
                       max_num_hands=4,
                       min_detection_confidence=0.5,
                       min_tracking_confidence=0.5)

mpDraw = mp.solutions.drawing_utils

def process_frame(img, csv_writer):
    img = cv2.flip(img, 1)
    img_RGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(img_RGB)

    if results.multi_hand_landmarks:
        for hand_idx, hand_landmarks in enumerate(results.multi_hand_landmarks):
            mpDraw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS,
                                  mp.solutions.drawing_styles.get_default_hand_landmarks_style(),
                                  mp.solutions.drawing_styles.get_default_hand_connections_style())
            if csv_writer is not None:
                row = [datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")]
                for lm_idx, lm in enumerate(hand_landmarks.landmark):
                    # Include 'presence' and 'visibility' for each landmark
                    row.extend([lm.x * img.shape[1], lm.y * img.shape[0], lm.z * img.shape[2], lm.presence, lm.visibility])
                csv_writer.writerow(row)

    return img

# Open CSV file for writing
csv_file = open('hand_poses.csv', mode='w', newline='')
csv_writer = csv.writer(csv_file)

# Write headers for CSV file
headers = ['timestamp']
for i in range(21):
    headers.extend([f'x_{i}', f'y_{i}', f'z_{i}', f'presence_{i}', f'visibility_{i}'])
csv_writer.writerow(headers)

# Initialize video capture
cap = cv2.VideoCapture(0)

while cap.isOpened():
    success, frame = cap.read()
    if not success:
        print('Error')
        break
    frame = process_frame(frame, csv_writer)
    cv2.imshow('my_window', frame)
    if cv2.waitKey(1) in [ord('q'), 27]:  # if 'q' or ESC is pressed, exit
        break

# Cleanup
csv_file.close()
cap.release()
cv2.destroyAllWindows()
lucker26 commented 10 months ago

Thank you for your suggestion, but presence should mean the probability of the existence of hands, not the probability of the existence of each key point, and the actual output of presence and visibility is 0. Why?