EricGuo5513 / momask-codes

Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
https://ericguo5513.github.io/momask/
MIT License
859 stars 73 forks source link

Can not reproduce the results of the RVQ #43

Open buptxyb666 opened 6 months ago

buptxyb666 commented 6 months ago

Thanks for your great work.

As mentioned other issues, i also find the RVQ FID score(0.02 v.s. 0.06) can't be reproduced even following released opt.txt. I wonder if the results are related to the GPU platform or Pytorch version ?

Murrol commented 6 months ago

We found that it might relate to the version differences in packages when processing the dataset. We are working on it. Please directly email me for dataset inquiry.

wang-zm18 commented 5 months ago

Thanks for the work! I have tried to repoduce the results of VRQ in Table 2 of the released paper, too. The repoduced results are roughly the same except the FID metric in the Generartion setting on HumanMLD (0.232 vs 0.051). The code is python eval_t2m_trans_res.py --res_name tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw --dataset_name t2m --name t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns --gpu_id 1 --cond_scale 4 --time_steps 10 --ext evaluation. I have not figured the problem out up to now. Thank you in advance!

HitBadTrap commented 5 months ago

Thanks for your great work.

As mentioned other issues, i also find the RVQ FID score(0.02 v.s. 0.06) can't be reproduced even following released opt.txt. I wonder if the results are related to the GPU platform or Pytorch version ?

Have you solved the problem now? @buptxyb666

Thanks.

wang-zm18 commented 4 months ago

I also tried to retrained rvq on the HumanMLD dataset with 50 epochs, and got the similar FID. I want to know how many epochs to train rvq on the HumanMLD dataset. By the way, I found the mean (from -0.86 to1.46) and std (from 0.01to 0.36)of HumanMLD is small (compared to those of KIT-ML, maybe the unit is different), so the data input to the rvq distributes within a very large range (such as -350.52~576.75 in a batch). Is the normalization is right? Maybe the normalization influence the training process heavily. @Murrol

Murrol commented 4 months ago

Ideally, it should save the best checkpoint using exactly our scripts. While you can refer to the provided pretrained model to find in which epoch it’s saved. Yea, it could be related to some deviation from data. You can send me an email for the official preprocess dataset from HumanML3D project. The purpose of this normalization is to standardize the dataset distribution under an assumption of Gaussian distribution. It could be some problems. But we just follow most of the benchmark settings.

wang-zm18 commented 4 months ago

I have sent you the email and show my previous confusion and my updated opinion. The conclusion is the process is right. Thank you! By the way, when I load the pretrained vq model, it only shows the epoch=-1. So I still not know how many epochs to train the vq model. When I trained it with the default 50 epochs, I got the same gap of fid as it. Thank you in advance!