Closed ggg66 closed 4 months ago
Sure. As mentioned in paper, the audio embedding and motion are aligned by directly concatenating along the temporal dimension. The corresponding code is here: https://github.com/JeremyCJM/DiffSHEG/blob/3ebf3058f48cba3da9146afb7623e9ec1ab9e9a5/models/transformer.py#L307
https://github.com/JeremyCJM/DiffSHEG/blob/3ebf3058f48cba3da9146afb7623e9ec1ab9e9a5/models/gaussian_diffusion.py#L1369 In calculating the loss, I noticed that the dimension C becomes twice its original size in this line of code, but I couldn't find where this transformation of shape occurs. Thanks for your reply
I saw this method in your paper. Could you please tell me where the code implementation is located? Thank you for your reply, great work!