Closed Terry-Joy closed 1 month ago
I think the issue may be related to the data preprocessing. Have you used mean and standard deviation, and visualized your ground truth (GT) data to confirm that your data processing is correct? The modeling process itself seems fine.
Please use --nodebug
for all your training.
https://github.com/ChenFengYe/motion-latent-diffusion/blob/081ce3152e5a95b14a4495fb2d32939c257c6216/README.md?plain=1#L210
I find there is no error on HumanML3D dataset. I think we don't need to use mean and standard deviation when testing. Because the reconstructed features differ significantly from the original features before your feats2joints function. I wonder if there's a problem with using the .npy files directly from the new_joints_feature directory in the HumanML3D training set as input to the VAE. I imported these features directly as input, but the reconstructed features differ significantly from those in the original file.
In addition, I found that even without going through the VAE, forcing the corresponding .npy feature file to go through feat2joints still results in something different from the corresponding joints. Why is this happening? I have double-checked the mean and standard deviation of the HumanML3D dataset, and the deviations are zero. Could it be that some special preprocessing was done to the feature before it was input into the VAE?
I want to know if, when using a single data sample to construct a test demo set for testing the VAE, should the mean and standard deviation (std) be recalculated just for this single sample, or should they still be based on the entire dataset?
Now, I think the issue may be related to the data preprocessing, could you please tell me where new_joints_vector was specially handled before it was passed to training_step
Thank you very much. Indeed, it is a data preprocessing issue!I have solve it!
I want to know if, theoretically, using the pre-trained VAE weights you provided, I should almost perfectly reconstruct the motion from the feature input corresponding to HumanML3D's training data. However, in practice, the reconstructed motion is almost completely static. My process is simple: train dataset -> feature -> vae_encoder -> z -> decoder. Is there something wrong with this process?