Yukun-Huang / DreamWaltz-G

Official implementation of the paper "DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion".
https://yukun-huang.github.io/DreamWaltz-G/
57 stars 2 forks source link

what library do you recommend to extract SMPL-X motion data from video in the wild? #3

Open gt732 opened 1 day ago

gt732 commented 1 day ago

Hi,

Thank you for the hard work and releasing the code! I was able to train a avatar and run animations on my 24GB 3090 in ~2hours.

Results:

https://github.com/user-attachments/assets/69d6ea06-4566-4500-9fa3-94bbf64411ff

If I wanted to extract SMPL-X motion data from my own reference video what library do you recommend to do that? I knows there's a few but they all do their own way of extracting it and the camera parameters can sometimes get messed up. Hoping to be able to animate the avatar given a video from the wild.

Thanks!!

gt732 commented 1 day ago

This library is able to extract SMPL-X parameters from a video in the wild.

https://github.com/HongwenZhang/PyMAF

the output format is

Dictionary with 13 keys:
  Key: 'pred_cam' ->     <class 'numpy.ndarray'> with shape (25, 3)
  Key: 'orig_cam' ->     <class 'numpy.ndarray'> with shape (25, 4)
  Key: 'orig_cam_t' ->     <class 'numpy.ndarray'> with shape (25, 3)
  Key: 'verts' ->     <class 'numpy.ndarray'> with shape (25, 6890, 3)
  Key: 'smplx_verts' ->     <class 'numpy.ndarray'> with shape (25, 10475, 3)
  Key: 'pose' ->     <class 'numpy.ndarray'> with shape (25, 72)
  Key: 'betas' ->     <class 'numpy.ndarray'> with shape (25, 10)
  Key: 'joints3d' ->     <class 'numpy.ndarray'> with shape (25, 49, 3)
  Key: 'joints2d' ->     List/Sequence with 25 elements:
        <class 'numpy.ndarray'> with shape (17, 3)
  Key: 'bboxes' ->     <class 'numpy.ndarray'> with shape (25, 4)
  Key: 'frame_ids' ->     List/Sequence with 25 elements:
        <class 'int'>
        Value: 0
  Key: 'person_ids' ->     List/Sequence with 25 elements:
        <class 'str'>
        Value: video2_mp4_f0_p0
  Key: 'smplx_params' ->     List/Sequence with 4 elements:
        Dictionary with 8 keys:
          Key: 'shape' ->             <class 'torch.Tensor'> with shape torch.Size([8, 10])
          Key: 'body_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 24, 3, 3])
          Key: 'left_hand_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 15, 3, 3])
          Key: 'right_hand_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 15, 3, 3])
          Key: 'jaw_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 1, 3, 3])
          Key: 'leye_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 1, 3, 3])
          Key: 'reye_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 1, 3, 3])
          Key: 'expression' ->             <class 'torch.Tensor'> with shape torch.Size([8, 50])

Can this be used with dreamwaltz-g framework to animate the avatar?

Thanks!!

johndpope commented 1 day ago

WRT smplx estimation - there's a more recent paper 2024 (haven't played with it) maybe better ? or same - https://github.com/ttxskk/AiOS

gt732 commented 23 hours ago

@johndpope thanks for the updated link! These things move so fast there's a new one every few months. I can give that one a shot the issue is getting it converted into a format this framework understands.

If scripts were provided to take videos from the wild....extract SMPL-X...feed it to dreamwaltz that would be icing on the cake! Looks like we are limited currently with motions from existing datasets.

A bunch of conversions being done here https://github.com/Yukun-Huang/DreamWaltz-G/tree/main/data/human

I'm new to this field so I'm heavily relying on gpt o1-preview and other LLMs to help.

Yukun-Huang commented 22 hours ago

Hi @gt732, thank you for sharing the results!

  1. I'm not quite sure about the current state-of-the-art work for SMPL-X estimation. AiOS seems to perform well and worth a shot. Thanks @johndpope for the recommendation.
  2. The output you provide from PyMAF can be used to animate the avatar. Don't worry, the input format used by DreamWaltz-G is exactly the same as SMPL-X. You might refer to the code here to construct your own motion files as provided in assets/motions/*.npy. However, the camera parameters are a bit tricky. If there's a reliable library for estimating SMPL-X and camera parameters, we could add a new feature that allows animating the avatars from in-the-wild videos.
johndpope commented 20 hours ago

just saw this too - https://pramishp.github.io/iHuman/index.html - updates from a few days ago.

gt732 commented 16 hours ago

@johndpope Nice find!

@Yukun-Huang Thanks for the reply and tips. I will try getting PyMAF parameters working with DreamWaltz-G. Regarding the camera I know mmhuman3d has helper functions for converting weak camera perspective to a pytorch3d camera.

https://github.com/open-mmlab/mmhuman3d/blob/8c95747175f8c4bd76d5e0be7be244f8a9a4a6de/mmhuman3d/core/visualization/visualize_smpl.py#L1129

I'm going to give this a shot and see if the functions can help convert PyMAF to a working virtual camera in pytorch3d for rendering.