pose landmark result in linux python has Nan more than window python

waterself commented 4 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

linux ubuntu 20.04, window 11, both anaconda, python==3.9.14,

Mobile device if the issue happens on mobile device

No response

Browser and version if the issue happens on browser

No response

Programming Language and version

python

MediaPipe version

window 0.10.10, linux 0.10.10

Bazel version

No response

Solution

Pose Landmark

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

No response

Xcode & Tulsi version (if issue is related to building for iOS)

No response

Describe the actual behavior

get different inference from window and linux

Describe the expected behaviour

task inference same or nearest Nan count

Standalone code/steps you may have used to try to get what you need


import cv2
import mediapipe as mp
import numpy as np
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import imageio

def extract_frames_fps(video_path):
    frames = []
    video_capture = cv2.VideoCapture(video_path)
    fps = video_capture.get(cv2.CAP_PROP_FPS)

    while video_capture.isOpened():
        ret, frame = video_capture.read()
        if not ret:
            break

        frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

    video_capture.release()

    return np.array(frames), int(fps) 

BaseOptions = mp.tasks.BaseOptions
PoseLandmarker = mp.tasks.vision.PoseLandmarker
PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode

model_path = 'model_path/pose_landmarker_heavy.task'

# Create a pose landmarker instance with the video mode:
options = PoseLandmarkerOptions(
    base_options=BaseOptions(model_asset_path=model_path),
    running_mode=VisionRunningMode.VIDEO,
    min_pose_presence_confidence=0.3,
    min_tracking_confidence=0.7,
    min_pose_detection_confidence=0.3
    )

video_path = '/path/to/my/video.mp4'

video, fps = uf.extract_frames_fps(video_path)

pose_landmarker_results = []
keypoints_frames = []
detect_result_list = []
annotated_frames = []

#create detector
with vision.PoseLandmarker.create_from_options(options) as detector:
    for i in range(video.shape[0]):
        frame = video[i,:,:,:]

        mp_frame = mp.Image(image_format=mp.ImageFormat.SRGB,
                  data=frame)

        detector_result = detector.detect_for_video(mp_frame, timestamp_ms=i*fps )

        annotated_frames.append(draw_landmarks_on_image(mp_frame.numpy_view(), detector_result))

        result_list = detector_result.pose_world_landmarks
        detect_result_list.append(result_list)
        keypoints_arr = []
        if len(result_list) == 0:
            keypoints_arr = np.full((1,33,5), fill_value=np.nan, dtype=np.float32)
        for idx in range(len(result_list)):
            pose_landmark = result_list[idx]
            kps = []
            for pl in pose_landmark:
                plkps = [pl.x, pl.y, pl.z, pl.visibility, pl.presence]
                kps.append(np.array(plkps))

            keypoints_arr.append(np.array(kps))
        keypoints_frames.append(np.array(keypoints_arr))
keypoints_frames = np.array(keypoints_frames)

print(copy_keypoints_frame[:50, 0, 13,0])
print(np.sum((np.isnan(copy_keypoints_frame))))

Other info / Complete Logs

On Linux python
=======================================================================

[ 0.01166975 -0.01325617 -0.03641914 -0.05470761 -0.07166258 -0.08079658
 -0.08318242 -0.08279768 -0.08881    -0.08541096 -0.08233733 -0.05183787
 -0.04390267 -0.02972412 -0.01699912 -0.00787161  0.00075246 -0.00227913
 -0.00521829 -0.00644223 -0.01679307 -0.03138978 -0.02844721 -0.05424543
 -0.07106167 -0.06548407 -0.06702822 -0.06672549         nan -0.19420004
 -0.22057012 -0.2284535  -0.19767897 -0.20616581 -0.21181265 -0.21647298
 -0.21128778 -0.03720265 -0.07819481         nan  0.00410479 -0.01395571
 -0.02828768 -0.03346587 -0.04220413 -0.04611417 -0.0616963  -0.08908865
 -0.00920953 -0.02047957]
1815
I0000 00:00:1716812751.904595  306817 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:84) egl_initializedUnable to initialize EGL
=======================================================================

on window11 python
=======================================================================
[ 0.01165697 -0.02807238 -0.04704126 -0.05284797 -0.08323435 -0.08977968
 -0.10047425 -0.08969924 -0.0984091  -0.06916987 -0.03707258 -0.02857973
 -0.03240457 -0.01915146 -0.01363734 -0.00499233  0.0069503  -0.0032917
 -0.00622438 -0.0143904  -0.03090671 -0.04554779 -0.02894793 -0.07363385
 -0.06342393 -0.06826069 -0.06399996 -0.08967623  0.10805667 -0.08458084
 -0.07507367 -0.00968433 -0.05319788 -0.07317979 -0.07963884 -0.07092766
 -0.0653493  -0.06612252 -0.05678153 -0.05748312 -0.04911917 -0.04954029
 -0.03681698 -0.04459389 -0.06993935 -0.08679859 -0.09368424 -0.09877644
         nan  0.26285872]
660
=======================================================================

waterself commented 4 months ago

sorry for forgot code of video instance initialization update in stand alone code. function extract_frames_fps(video_path)

kuaashish commented 4 months ago

Hi @waterself,

Could you please confirm whether you are running both operating systems on the same machine or both are separate machines?

Thank you!!

waterself commented 4 months ago

Hello dear @kuaashish,

For specifics my environment , defining Computer A and B: A: Laptop, Windows 11, AMD Ryzen 5 5600H B: Desktop, Linux 20.04, Intel i9-10900X -> No WSL, only operate Linux

So, the above result is operated by different, separate machines.

Thanks for your kindness.

kuaashish commented 4 months ago

Hi @mbrenon,

Could you please look into this issue?

Thank you!!

google-ai-edge / mediapipe