Closed RobinWitch closed 1 year ago
Hi. It is strange, and I haven't met this before. But in the first epoches, the phenomenon that the loss doesn't decrease is normal. I think this may related to the environment. Maybe you can decrease the learning rate and have a try. Then can you produce reasonable results when the training is finished?
pos_embbeding means positional embedding in the transformer. Here the learnable parameters are added as the positional embedding.
Thank you for your explanation ! I can produce reasonable results when the training is finished , I will rebuild my envirment and try again. emm, while I still have some questions. In transformer model , I always see positional embedding as a constant parameters( offen caculated by sin cos ), why you choose it as a learnable parameters?
Both are reasonable, and this part is not critical in our former experiments.
Thank you for your patient reply ! Help me a lot !
Hi ,Dr.Zhu! This is a nice job ,what inspires me a lot. While I have some question.
First , in training process, sometimes loss will converge to 1 , especially in the first epoch , it confuses me a lot . However in some train processes the loss begin to decrease after 1-4 epoch and a few processes will remain 1.
Second , in some epoch , the loss will suddenly increase from 0.03 to 10000+ ??? And then rapidly decrease . This phenomenent occurs irregularly , offen it occurs every few tens of epoch.
Last, in the code " ./scripts/model/diffusion_util.py/TransfomerModel" ,why add a rand tensor "self.pos_embedding = nn.Parameter(torch.randn(1, num_pose, hidden_dim))" ?