Luke-Luo1 / POPDG

[CVPR 2024] POPDG: Popular 3D Dance Generation with PopDanceSet
https://luke-luo1.github.io/POPDG/
MIT License
33 stars 3 forks source link

How to extract manual and kinetic feature for aist++ pkl data(train/test)? #6

Open Andy010902 opened 2 months ago

Andy010902 commented 2 months ago

Hi, @Luke-Luo1 Thanks for your fantastic work! I eagerly wonder how to extract manual and kinetic feature for aist++ pkl data(train/test)? I am aware of the difference between popdg dataset and aist++ dataset, in which kinetic foward i.e. "full_pose" smpl_forward is different in dataset.py: for popdg, in ../dataset/load_popdanceset.py(line:169-198)

        smpl = SMPLSkeleton()
        # to Tensor
        root_pos = torch.Tensor(root_pos)
        local_q = torch.Tensor(local_q)
        # to ax
        bs, sq, c = local_q.shape
        local_q = local_q.reshape((bs, sq, -1, 3))

        # After the extraction of PoPDanceSet using HyBrIK, 
        # rotation adjustments need to be made to ensure that the human body's z-axis is oriented upwards.
        root_q = local_q[:, :, :1, :]  
        root_q_quat = axis_angle_to_quaternion(root_q)
        rotation = torch.Tensor(
            [0.7071068, -0.7071068, 0, 0]
        ) 
        root_q_quat = quaternion_multiply(rotation, root_q_quat)
        root_q = quaternion_to_axis_angle(root_q_quat)
        local_q[:, :, :1, :] = root_q

        # The coordinates of the human body's root joint 
        # need to be shifted and scaled to fall within the range of [-1, 1].
        pos_rotation = RotateAxisAngle(-90, axis="X", degrees=True)
        root_pos = pos_rotation.transform_points(
            root_pos
        )
        root_pos[:, :, 2] += 1
        root_pos[:, :, 1] -= 5

        # do FK
        positions = smpl.forward(local_q, root_pos)  # batch x sequence x 24 x 3

but in EDGE, there are: (in ../dataset/dance_dataset.py, line 144 - 169)

smpl = SMPLSkeleton()
        # to Tensor
        root_pos = torch.Tensor(root_pos)
        local_q = torch.Tensor(local_q)
        # to ax
        bs, sq, c = local_q.shape
        local_q = local_q.reshape((bs, sq, -1, 3))

        # AISTPP dataset comes y-up - rotate to z-up to standardize against the pretrain dataset
        root_q = local_q[:, :, :1, :]  # sequence x 1 x 3
        root_q_quat = axis_angle_to_quaternion(root_q)
        rotation = torch.Tensor(
            [0.7071068, 0.7071068, 0, 0]
        )  # 90 degrees about the x axis
        root_q_quat = quaternion_multiply(rotation, root_q_quat)
        root_q = quaternion_to_axis_angle(root_q_quat)
        local_q[:, :, :1, :] = root_q

        # don't forget to rotate the root position too 😩
        pos_rotation = RotateAxisAngle(90, axis="X", degrees=True)
        root_pos = pos_rotation.transform_points(
            root_pos
        )  # basically (y, z) -> (-z, y), expressed as a rotation for readability

        # do FK
        positions = smpl.forward(local_q, root_pos)  # batch x sequence x 24 x 3

So I guess that there must be relative change in ../eval/extract_features.py(line: 24 - 50):

        smpl = SMPLSkeleton()
        pos, q = data["pos"], data["q"]
        scale = data["scale"]
        pos *= scale

        pos = torch.from_numpy(pos).unsqueeze(0)

        q = torch.from_numpy(q).unsqueeze(0).view(1,-1,24,3)

        root_q = q[:, :, :1, :]  
        root_q_quat = axis_angle_to_quaternion(root_q)
        rotation = torch.Tensor(
            [0.7071068, -0.7071068, 0, 0]
        )  
        root_q_quat = quaternion_multiply(rotation, root_q_quat)
        root_q = quaternion_to_axis_angle(root_q_quat)
        q[:, :, :1, :] = root_q

        pos_rotation = RotateAxisAngle(-90, axis="X", degrees=True)
        pos = pos_rotation.transform_points(
            pos
        ) 
        pos[:, :, 2] += 1
        pos[:, :, 1] -= 5

        keypoints3d = smpl.forward(q, pos).detach().cpu().numpy().squeeze()  # b, s, 24, 3

Thanks for your great contribution all the time! I am truly grateful and looking forward to your kind instruction. Best wishes!

Luke-Luo1 commented 2 months ago

Both in the AIST++ dataset and the PopDanceSet, the purpose of these adjustments is to ensure that the dancer faces forward towards the audience. Whether it’s through array cameras in AIST++ or monocular 3D keypoint detection in PopDanceSet, the extracted 3D body data needs to be correctly aligned when placed into the SMPL model. This alignment involves adjusting the coordinates and orientation of the body to standardize the direction the dancer is facing.

Best regards