Why the DS-Attention used in dance decoder only considering the upper body attention score accumulation?

Luke-Luo1 / POPDG

[CVPR 2024] POPDG: Popular 3D Dance Generation with PopDanceSet

https://luke-luo1.github.io/POPDG/

MIT License

28 stars 2 forks source link

Why the DS-Attention used in dance decoder only considering the upper body attention score accumulation? #3

Closed Andy010902 closed 3 weeks ago

Andy010902 commented 1 month ago

Hey, @Luke-Luo1 Thank you for your great work! I noticed that the DS-attention used in dance decoder only enhances the attention score of the upper body joints. I just wonder why not do the DS-attention in the whole body joints(0,1, ..., 23), instead POPDG claimed using the DS-attention just to improve attention score of the upper body joints(0, 3, 6, 9, 12, 13, 14, 15, 16, 17). Best wishes! Andy

Luke-Luo1 commented 1 month ago

Thank you for your kind words and for your thoughtful question.

As mentioned in our paper, the accumulation of errors predominantly occurs in the upper body. The hands and feet, compared to the root joints, tend to have similar or even better accuracy. This might be because, in dance movements, these parts already receive significant attention from the model due to their pronounced motion.