Yukun-Huang / DreamWaltz-G

Official implementation of the paper "DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion".
https://yukun-huang.github.io/DreamWaltz-G/
75 stars 3 forks source link

[Datasets] Where to download SMPL-X.zip? #4

Open xdobetter opened 6 days ago

xdobetter commented 6 days ago

Hello, I am very interested in your work. I have been looking for the link about SMPL-X.zip for a long time, but I can't find it. Could you give me some advice?


datasets
├── 3DPW
│   ├── readme_and_demo.zip
│   ├── sequenceFiles.zip
│   └── SMPL-X.zip ???
Yukun-Huang commented 6 days ago

The SMPL-X params of 3DPW can be download from here, but I don't think I have tested them.

xdobetter commented 6 days ago

Thank you for your assistance. I will try it. I have other questions:

  1. How do I create a 2D Human Video Reenactment dataset?
  2. Does it support exporting mesh?
  3. How to replace the animation content of aist?
gt732 commented 6 days ago

@Yukun-Huang @xdobetter

This is the closest I've gotten to animating using a video in the wild. I used pymaf-x to get the SMPLX poses but for the life of me I can't figure out how to render the camera in the correct position. pymaf-x uses a weak camera perspective, this will need to be converted into something dreamwatlz-g understands and fed into

https://github.com/Yukun-Huang/DreamWaltz-G/blob/a4bd7de3bd5b0d3f930085a48f76f858cb1c0ea5/core/human/smpl_prompt.py#L151

pymaf-x params for the file i used its based off this video

https://github.com/user-attachments/assets/259df1f4-28f8-4c23-9e3b-f777c8e8801b

If someone wants to give it a shot here is the file. Once this is figured out we can easily use videos from the wild and animate using custom 3d avatars

https://github.com/HongwenZhang/PyMAF-X

https://drive.google.com/file/d/1VaMbQlCMqciw72us5XMiPIykrGvsZ1V1/view?usp=sharing

Dictionary with 13 keys:
  Key: 'pred_cam' ->     <class 'numpy.ndarray'> with shape (25, 3)
  Key: 'orig_cam' ->     <class 'numpy.ndarray'> with shape (25, 4)
  Key: 'orig_cam_t' ->     <class 'numpy.ndarray'> with shape (25, 3)
  Key: 'verts' ->     <class 'numpy.ndarray'> with shape (25, 6890, 3)
  Key: 'smplx_verts' ->     <class 'numpy.ndarray'> with shape (25, 10475, 3)
  Key: 'pose' ->     <class 'numpy.ndarray'> with shape (25, 72)
  Key: 'betas' ->     <class 'numpy.ndarray'> with shape (25, 10)
  Key: 'joints3d' ->     <class 'numpy.ndarray'> with shape (25, 49, 3)
  Key: 'joints2d' ->     List/Sequence with 25 elements:
        <class 'numpy.ndarray'> with shape (17, 3)
  Key: 'bboxes' ->     <class 'numpy.ndarray'> with shape (25, 4)
  Key: 'frame_ids' ->     List/Sequence with 25 elements:
        <class 'int'>
        Value: 0
  Key: 'person_ids' ->     List/Sequence with 25 elements:
        <class 'str'>
        Value: video2_mp4_f0_p0
  Key: 'smplx_params' ->     List/Sequence with 4 elements:
        Dictionary with 8 keys:
          Key: 'shape' ->             <class 'torch.Tensor'> with shape torch.Size([8, 10])
          Key: 'body_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 24, 3, 3])
          Key: 'left_hand_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 15, 3, 3])
          Key: 'right_hand_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 15, 3, 3])
          Key: 'jaw_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 1, 3, 3])
          Key: 'leye
_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 1, 3, 3])
          Key: 'reye_pose' ->             <class 'torch.Tensor'> with shape torch.Size([8, 1, 3, 3])
          Key: 'expression' ->             <class 'torch.Tensor'> with shape torch.Size([8, 50])

My results using the FIXED camera in dreamwaltz

https://github.com/user-attachments/assets/7efb8f6d-49c0-463a-8f14-dc4de1bb7339

Yukun-Huang commented 5 days ago

Thank you for your assistance. I will try it. I have other questions:

  1. How do I create a 2D Human Video Reenactment dataset?
  2. Does it support exporting mesh?
  3. How to replace the animation content of aist?
  1. SMPL-X and camera parameters of our 2D Human Video Reenactment dataset are estimated using a private model. You might use TRAM or other state-of-the-art 3D human motion estimation models for extracting these parameters from in-the-wild videos. Human matting and image/video inpainting models are also needed to remove moving people in videos.
  2. DreamWaltz-G uses Instant-NGP to encode the opacities of 3D Gaussians. So it should be possible to extract the mesh using the NeRF2Mesh method. But I'm sure it won't work well because DreamWaltz-G doesn't involve any surface regularization.
  3. I haven't tried it yet. You may refer to @gt732's implementation and convert the camera parameters of AIST into a DreamWaltz-G compatible format and then feed it into: https://github.com/Yukun-Huang/DreamWaltz-G/blob/a4bd7de3bd5b0d3f930085a48f76f858cb1c0ea5/core/human/smpl_prompt.py#L151
gt732 commented 5 days ago

@Yukun-Huang @xdobetter

Finally was able to get it working. I used https://yufu-wang.github.io/tram4d/ to estimate SMPL/Camera parameters and calculated the Camera Intrinsics and Extrinsics ( thanks to claude sonnet 🙏). I will be working on a pull request to add the feature into dreamwaltz-g. It will be focused on a single person for now.

https://github.com/user-attachments/assets/5d7b693a-ea20-4ec8-9c0e-58271beb9a6c

https://github.com/user-attachments/assets/824a7b9b-08e9-4fd0-a554-82d6ee078b29

https://github.com/user-attachments/assets/2f8892a2-1583-47bc-9582-584af72f546d

https://github.com/user-attachments/assets/24a1b1fe-71b6-481c-9e16-7205da60d220

xdobetter commented 4 days ago

Thank you to everyone for your enthusiastic assistance, which enabled me to resolve these issues :rocket: .

xdobetter commented 4 days ago

Hello~everyone, I'm also curious about the following two questions: [issue 1] Whether the demo on the project page is implemented in this way?

bash scripts/inference_reenact.sh

image

[issue 2] "inference-time shape editing by explicitly adjusting the 3D Gaussians" Which parameters need to be adjusted for this?

image

In advance, my gratitude to everyone.

Yukun-Huang commented 4 days ago

@xdobetter

  1. The script scripts/inference_reenact.sh is for the video reenactment demo on the project page. If you want to make the pure animation demo in the picture you provided, you may need to modify --prompt.scene demo,aist in the aist inference script into the new motion source, for example, --prompt.scene motionx, dance/subset_0000/A_Hundred_Dances.
  2. Shape editing can be achieved by setting the parameter --prompt.observed_betas. Shape control is more complicated, which requires retraining the NeRF human template and setting the --prompt.canonical_betas parameter.