ChenFengYe / motion-latent-diffusion

[CVPR 2023] Executing your Commands via Motion Diffusion in Latent Space, a fast and high-quality motion diffusion model
https://chenxin.tech/mld/
MIT License
586 stars 55 forks source link

Mask on clips with varied length #38

Closed csvt32745 closed 1 year ago

csvt32745 commented 1 year ago

Hi~ My VAE training result produces mean or static poses in most cases. I found that VAE transformers take masks to process clips with varied length, but loss computation doesn't apply them. Does it works normally or just heavily affects the results on my small dataset?

Thanks :)

ChenFengYe commented 1 year ago

Hi, both gt and predictions should be applied padding process. The loss part does not need to apply masks, because the net has already set the padding part as zeros. https://github.com/ChenFengYe/motion-latent-diffusion/blob/c28a06435077800fde4d76ef93eb2a4016a5120c/mld/models/architectures/mld_vae.py#L245

Your VAE training should not result in static poses, and the static results mean your training is bad. Please refer to #28

Please also check your training data, hyper-parameters, and mean/std for datasets (like below). https://github.com/ChenFengYe/motion-latent-diffusion/blob/c28a06435077800fde4d76ef93eb2a4016a5120c/README.md?plain=1#L196

If you use a new dataset, you should replace the mean/std files.

csvt32745 commented 1 year ago

Thanks for the reply :) I miss that line, sorry.

I'll check the function and mean/std. Btw, I compute the feature & data as the same as HumanML3D.