TemugeB / bodypose3d

Real time 3D body pose estimation with Mediapipe
MIT License
174 stars 32 forks source link

Collecting keypoints from the wrong frame #6

Closed valioiv closed 2 years ago

valioiv commented 2 years ago

Hello! First of all - great job! Very useful code! I'm just reviewing it and I've noticed something, please correct me if I'm wrong. In this snippet seems we collect frame1_keypoints from frame0 instead of frame1, am I right? I've marked it with "THIS LOOKS WEIRD" below as a comment:

        #check for keypoints detection
        frame0_keypoints = []
        if results0.pose_landmarks:
            for i, landmark in enumerate(results0.pose_landmarks.landmark):
                if i not in pose_keypoints: continue #only save keypoints that are indicated in pose_keypoints
                pxl_x = landmark.x * frame0.shape[1]
                pxl_y = landmark.y * frame0.shape[0]
                pxl_x = int(round(pxl_x))
                pxl_y = int(round(pxl_y))
                cv.circle(frame0,(pxl_x, pxl_y), 3, (0,0,255), -1) #add keypoint detection points into figure
                kpts = [pxl_x, pxl_y]
                frame0_keypoints.append(kpts)
        else:
            #if no keypoints are found, simply fill the frame data with [-1,-1] for each kpt
            frame0_keypoints = [[-1, -1]]*len(pose_keypoints)

        #this will keep keypoints of this frame in memory
        kpts_cam0.append(frame0_keypoints)

        frame1_keypoints = []
        if results1.pose_landmarks:
            for i, landmark in enumerate(results1.pose_landmarks.landmark):
                if i not in pose_keypoints: continue
                pxl_x = landmark.x * frame0.shape[1]    #THIS LOOKS WEIRD
                pxl_y = landmark.y * frame0.shape[0]    #THIS LOOKS WEIRD
                pxl_x = int(round(pxl_x))
                pxl_y = int(round(pxl_y))
                cv.circle(frame1,(pxl_x, pxl_y), 3, (0,0,255), -1)
                kpts = [pxl_x, pxl_y]
                frame1_keypoints.append(kpts)

        else:
            #if no keypoints are found, simply fill the frame data with [-1,-1] for each kpt
            frame1_keypoints = [[-1, -1]]*len(pose_keypoints)

I've made a cross-comparison with handpose3d repo where we collect the keypoints correctly in my opinion ("THIS LOOKS GOOD"):


        #prepare list of hand keypoints of this frame
        #frame0 kpts
        frame0_keypoints = []
        if results0.multi_hand_landmarks:
            for hand_landmarks in results0.multi_hand_landmarks:
                for p in range(21):
                    #print(p, ':', hand_landmarks.landmark[p].x, hand_landmarks.landmark[p].y)
                    pxl_x = int(round(frame0.shape[1]*hand_landmarks.landmark[p].x))
                    pxl_y = int(round(frame0.shape[0]*hand_landmarks.landmark[p].y))
                    kpts = [pxl_x, pxl_y]
                    frame0_keypoints.append(kpts)

        #no keypoints found in frame:
        else:
            #if no keypoints are found, simply fill the frame data with [-1,-1] for each kpt
            frame0_keypoints = [[-1, -1]]*21

        kpts_cam0.append(frame0_keypoints)

        #frame1 kpts
        frame1_keypoints = []
        if results1.multi_hand_landmarks:
            for hand_landmarks in results1.multi_hand_landmarks:
                for p in range(21):
                    #print(p, ':', hand_landmarks.landmark[p].x, hand_landmarks.landmark[p].y)
                    pxl_x = int(round(frame1.shape[1]*hand_landmarks.landmark[p].x))    #THIS LOOKS GOOD
                    pxl_y = int(round(frame1.shape[0]*hand_landmarks.landmark[p].y))    #THIS LOOKS GOOD
                    kpts = [pxl_x, pxl_y]
                    frame1_keypoints.append(kpts)

        else:
            #if no keypoints are found, simply fill the frame data with [-1,-1] for each kpt
            frame1_keypoints = [[-1, -1]]*21

        #update keypoints container
        kpts_cam1.append(frame1_keypoints)
TemugeB commented 2 years ago

Hi,

Thanks for the comment. The detected keypoints are stored in landmarks.x and landmarks.y but they are in [0,1] range. To obtain the pixel position of the keypoints, we have to multiply by the size of the frame. This is where frame.shape is being used.

If the frames are both the same size, this does not cause any issues in the code. However, if two different frame sizes are used, then it will cause a problem.

Good catch, I will update the code. Thanks again.

valioiv commented 2 years ago

Got your point! Thank you!