Some questions about experiment and code

Advocate99 / DiffGesture

[CVPR'2023] Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

GNU General Public License v3.0

234 stars 16 forks source link

Some questions about experiment and code #13

Closed RobinWitch closed 1 year ago

RobinWitch commented 1 year ago

Hi ,Dr.Zhu! This is a nice job ,what inspires me a lot. While I have some question.

First , in training process, sometimes loss will converge to 1 , especially in the first epoch , it confuses me a lot . However in some train processes the loss begin to decrease after 1-4 epoch and a few processes will remain 1.

Second , in some epoch , the loss will suddenly increase from 0.03 to 10000+ ??? And then rapidly decrease . This phenomenent occurs irregularly , offen it occurs every few tens of epoch.

Last, in the code " ./scripts/model/diffusion_util.py/TransfomerModel" ,why add a rand tensor "self.pos_embedding = nn.Parameter(torch.randn(1, num_pose, hidden_dim))" ?

Advocate99 commented 1 year ago

Hi. It is strange, and I haven't met this before. But in the first epoches, the phenomenon that the loss doesn't decrease is normal. I think this may related to the environment. Maybe you can decrease the learning rate and have a try. Then can you produce reasonable results when the training is finished?

pos_embbeding means positional embedding in the transformer. Here the learnable parameters are added as the positional embedding.

RobinWitch commented 1 year ago

Thank you for your explanation ! I can produce reasonable results when the training is finished , I will rebuild my envirment and try again. emm, while I still have some questions. In transformer model , I always see positional embedding as a constant parameters( offen caculated by sin cos ), why you choose it as a learnable parameters?

Advocate99 commented 1 year ago

Both are reasonable, and this part is not critical in our former experiments.

RobinWitch commented 1 year ago

Thank you for your patient reply ! Help me a lot !