Visibility Scores Remain at 0 for Clearly Visible Joints in Tennis Application

Nit-Rathore commented 9 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Ubuntu 22.04

Mobile device if the issue happens on mobile device

No response

Browser and version if the issue happens on browser

No response

Programming Language and version

Python: 3.8.17

MediaPipe version

0.10.9

Bazel version

No response

Solution

Pose

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

No response

Xcode & Tulsi version (if issue is related to building for iOS)

No response

Describe the actual behavior

Visibility scores for clearly visible joints remain at 0, and presence scores are high across all joints, making it challenging to filter based on keypoint reliability for tennis movement analysis.

Describe the expected behaviour

Visibility scores should reflect the actual visibility of joints in the video frames, with non-zero values for clearly visible joints. Presence scores should provide a meaningful measure to filter out less reliable keypoints.

Standalone code/steps you may have used to try to get what you need

1. Set up a MediaPipe 0.10.9 environment on Ubuntu 22.04.
2. Run the pose estimation on a sample tennis video where players are in clear view and actively moving.
3. Observe the output keypoints, focusing on the visibility and presence scores.

Other info / Complete Logs

In my application, which focuses on analyzing tennis players' movements, I've observed that the visibility scores for keypoints/joints provided by Pose consistently remain at 0, even when the joints are clearly visible in the video frames. This issue is particularly prevalent in dynamic scenes common in tennis, where players are frequently moving and changing orientation.

Additionally, the presence scores for all joints are reported with high confidence, which complicates filtering out unreliable keypoints based on my application's needs. This behavior affects the reliability of pose estimation and subsequent analysis in my tennis-focused application.

kuaashish commented 9 months ago

Hi @Nit-Rathore,

Could you please provide a detailed outline of the steps you are following, making reference to the documentation? Alternatively, if possible, could you share the complete standalone code to help us reproduce and better understand the issue?

Thank you!!

Nit-Rathore commented 9 months ago

`import cv2 import mediapipe as mp from cvpr.human.utils.keypoint_config import COLUMN_NAMES, _get_columns from cvpr.human.utils.csv_utils import CSV from cvpr.human.utils.jitter_detect import jitter_detect from cvpr.human.keypoint_signature_mapping import save_video from cvpr.common.interpolate.interpolate_df import interpolate_df import os

def load_model(model_complexity: int = 2, cpu: bool = True): """Return model dictionary.""" if cpu: mp_pose = mp.solutions.pose return { "model_obj": mp_pose.Pose( min_detection_confidence=0.2, min_tracking_confidence=0.2, model_complexity=model_complexity, ), "device": "cpu", } else: return {"model_obj": None, "device": "gpu"}

def detect_humanpose( model, sport: str, video_path: str, csv_path: str = None, return_dataframe: bool = True, save_video_path: str = None, save_filtered_video: bool = False, hand: str = "right" ): """Get dataframe containting human pose keypoints.

Parameters
----------
K.Model
    Model object returned by load_model.
sport : str
    The type of sport
video_path : str
    The path to the video file
csv_path : str, optional
    Path to directory to save the resultant CSV, by default None
return_dataframe : bool, optional
    False to not return DataFrame, by default True
save_video_path : str, optional
    Path to directory to save the resultant overlay videos, by default None
save_filtered_video : bool, optional
    False to not save the filtered video, by default False
hand : str, optional
    Specify the hand, either 'right' or 'left', by default 'right'
"""

try:
    assert hand in ("right", "left"), "Enter either 'right' or 'left'"
    sport_prompt = "Enter either 'tennis', 'cricket', 'yoga', or 'basketball'"
    assert sport in ("tennis", "cricket", "yoga", "basketball"), sport_prompt
    if csv_path != None:
        isExist1 = os.path.exists(csv_path)
        if not isExist1:
            os.makedirs(csv_path)
    if save_video_path != None:
        isExist2 = os.path.exists(save_video_path)
        if not isExist2:
            os.makedirs(save_video_path)

    csv_full_path = None
    if csv_path is not None:
        csv_name = (
            os.path.splitext(os.path.basename(os.path.normpath(video_path)))[0]
            + "_p.csv"
        )

        csv_full_path = os.path.join(csv_path, csv_name)

    csv = CSV(filename=csv_full_path, columns=_get_columns(get_3d))

    assert os.path.exists(video_path), "Error: Input video does not exist."
    model_obj, device = model["model_obj"], model["device"]

    cap = cv2.VideoCapture(video_path)
    fps = cap.get(cv2.CAP_PROP_FPS)
    height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
    width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)

    if device == "cpu":
        frame_number = 0

        while cap.isOpened():
            success, image = cap.read()
            if not success:
                break

            frame_number += 1

            timestamp = frame_number / fps

            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image.flags.writeable = False
            results = model_obj.process(image)

            img_shape = image.shape
            add_to_csv(csv, frame_number, timestamp, results, img_shape, get_3d)

            if cv2.waitKey(5) & 0xFF == 27:
                break
        cap.release()'

Nit-Rathore commented 9 months ago

Description of steps:

The code loads a pose estimation model using MediaPipe library. It defines a function to detect human pose keypoints in a video, with options to specify sport type, video path, output paths for CSV and video files, etc. It checks the validity of specified parameters such as hand and sport. Directories are created for saving CSV and video files if specified. A CSV object is initialized to store keypoints. The video file is opened, and properties such as frame rate, height, and width are extracted. Frames are processed based on the device (CPU or GPU) using appropriate functions for this case only CPU implementation is shown Additional functions and steps for processing frames are called or defined within the script.

Guide I'm following:-

https://developers.google.com/mediapipe/solutions/vision/pose_landmarker/python

kuaashish commented 9 months ago

Hi @Nit-Rathore,

After reviewing the provided standalone code, it appears that you are currently using the outdated pose solution, which is no longer maintained, and we have ceased support for it. This functionality has been integrated into the new Pose Landmarker Task API, detailed here.

We encourage you to explore the features of our updated Pose Landmarker Task API and suggest replacing the legacy Pose with the new Pose Landmarker. The new solution offers improved performance and additional functionality compared to the legacy pose solution. You can find the guide for the new Pose Landmarker here, along with specific instructions for implementation in the Python platform provided here. Additionally, a corresponding example Colab notebook is available for reference here.

Please report any observed behavior, ensuring to check if similar issues persist in the upgraded task API. We value your feedback. Unfortunately, beyond this, there is limited action we can take to address the specific issue you are facing.

Thank you!!

Nit-Rathore commented 9 months ago

Hey, I'm using the latest version itself, but I'll check again, there was a mention of the same issue a couple of years back:-

https://github.com/google/mediapipe/issues/4411

I'm facing the same issue

ayushgdev commented 9 months ago

Hello @Nit-Rathore From the code excerpt, the pose solution being used is indeed obsolete. This is evident in the function

def load_model(model_complexity: int = 2, cpu: bool = True):
   """Return model dictionary."""
   if cpu:
     mp_pose = mp.solutions.pose                      # <= This usage from mediapipe.solutions is obsolete
     return {
       "model_obj": mp_pose.Pose(
         min_detection_confidence=0.2,
         min_tracking_confidence=0.2,
         model_complexity=model_complexity,
        ),
      ...
      ...

The new APIs use PoseLandmarker class which requires context manager to be created with with keyword, and many other major changes. A very small example snippet:

import mediapipe as mp

BaseOptions = mp.tasks.BaseOptions
PoseLandmarker = mp.tasks.vision.PoseLandmarker
PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = PoseLandmarkerOptions(
    base_options=BaseOptions(model_asset_path=model_path),
    running_mode=VisionRunningMode.IMAGE)

with PoseLandmarker.create_from_options(options) as landmarker:
  # The landmarker is initialized. Use it here.
  # Perform pose landmarking on the provided single image.
  pose_landmarker_result = landmarker.detect(mp_image)

Please migrate to our new MediaPipe Tasks API. You can find the documentation here

github-actions[bot] commented 9 months ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 9 months ago

Are you satisfied with the resolution of your issue? Yes No

google-ai-edge / mediapipe