Luke-Luo1 / POPDG

[CVPR 2024] POPDG: Popular 3D Dance Generation with PopDanceSet
https://luke-luo1.github.io/POPDG/
MIT License
30 stars 2 forks source link

Inquiry About Training Details of the Upgraded PopDanceSet #4

Open shenxiaojian opened 1 month ago

shenxiaojian commented 1 month ago

Hi,

Thanks for your excellent work! I've been working on reproducing your paper using two A800 GPUs, but I've encountered some issues. In the PopDanceSet_Description, it's mentioned that the PopDanceSet has been upgraded. With repeat_count=1, I found that one epoch takes about 5 minutes to run, meaning 2000 epochs would take approximately 166 hours. This significantly differs from the 66 hours mentioned in the paper.

I suspect that this might be due to substantial differences between the updated dataset and the previous one.

Could you please clarify:

  1. Should I use the same hyperparameters as mentioned in the paper (i.e., a learning rate of 0.001, betas of 0.02, 0.08, and 0.01, with an epsilon of 1e-8) to reproduce the results shown in the PopDanceSet_Description table?
  2. Or have there been any adjustments to the hyperparameters after the dataset update?

Here is the table from the PopDanceSet_Description:

Method PFC ↓ PBC → Div_k ↑ Div_g ↑ Beat Align Scores ↑
GroundTruth 1.5824 8.7365 9.0219 7.2931 0.174
FACT 5.8791 1.0200 6.8245 4.3387 0.206
Bailando 3.9751 4.8863 5.1835 5.4342 0.230
EDGE 3.8366 4.0348 6.1709 5.7568 0.224
POPDG 1.8253 5.9492 7.1342 5.8314 0.233

Thank you very much for your assistance.

Luke-Luo1 commented 1 month ago

Thank you for your interest in my work and for your detailed question.

  1. Regarding the dataset:

You are correct in noting that the upgraded PopDanceSet has significantly increased in size, now comparable to the AIST++ dataset. After applying data augmentation, the actual amount of training data indeed exceeds that of the AIST++ dataset, which contributes to the longer training time you are experiencing.

  1. Hyperparameters:

I have experimented with many hyperparameters during my work, and there were no substantial changes. Therefore, you can continue using the same hyperparameters as mentioned in the paper (i.e., a learning rate of 0.001, betas of 0.02, 0.08, and 0.01, with an epsilon of 1e-8) to reproduce the results.

I hope this helps clarify your concerns.