ChenFengYe / motion-latent-diffusion

[CVPR 2023] Executing your Commands via Motion Diffusion in Latent Space, a fast and high-quality motion diffusion model
https://chenxin.tech/mld/
MIT License
579 stars 52 forks source link

How to verify whether the training of VAE is good? #16

Open LinghaoChan opened 1 year ago

LinghaoChan commented 1 year ago

How to verify whether the training of VAE is good? Have you provided any code for the visualization of VAE training?

ChenFengYe commented 1 year ago

Hi LinghaoChan,

If you use the humanml3d dataset, the normal range for diffusion (text-to-motion task) is around [0.45, 1.0]. It should be [0.2, 0.4] for VAE. For the visualization, both vae and diffusion stage can use the same visualization scripts. You can refer to "Details of training" of FAQ (github readme) and issue #5 and #9 for more details.

P.S.

1. Set up blender - WIP

Refer to TEMOS-Rendering motions for blender setup, then install the following dependencies.

YOUR_BLENDER_PYTHON_PATH/python -m pip install -r prepare/requirements_render.txt

2. (Optional) Render rigged cylinders

Run the following command using blender:

YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video --joint_type=HumanML3D

2. Create SMPL meshes with:

python -m fit --dir YOUR_NPY_FOLDER --save_folder TEMP_PLY_FOLDER --cuda

This outputs:

  • mesh npy file: the generate SMPL vertices with the shape of (nframe, 6893, 3)
  • ply files: the ply mesh file for blender or meshlab

3. Render SMPL meshes

Run the following command to render SMPL using blender:

YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video --joint_type=HumanML3D

optional parameters:

  • --mode=video: render mp4 video
  • --mode=sequence: render the whole motion in a png image.
ChenFengYe commented 1 year ago

Hi, if your VAE results are not correct, please pay attention to this issue #18. We have fixed the bug on KL loss.

https://github.com/ChenFengYe/motion-latent-diffusion/blob/719c219bd059e14f84f2571ab6975855ede0d819/mld/models/losses/mld.py#L105

LinghaoChan commented 1 year ago

fine, thx.

LinghaoChan commented 1 year ago

Hi, here. I notice the LAMBDA_KL=0.0001. It is much smaller than other LAMBDAs. Does it really work in training VAE? I train the model w/ and w/o it. It seems both results are good.

ChenFengYe commented 1 year ago

It is quite important for the second stage (diffusion stage). KL can regularize the latent distribution, thus making the latent space meaningful. If you refer to other papers, the weight of KL loss is usually set to a small value, like 1e-3, 1e-4, 1e-5.