Open mengW6 opened 1 month ago
Thank you for asking.
The input format of these three conditions are encoded features (content: (bs, 6, 256), style: (bs, 1, 256), trajectory: (bs, 1, 256)) from content motion and style motion. We use this diagram to represent the main information contained in these features. We apologize for the misunderstandings.
Please feel free to ask if you have any other questions.
Thank you for your reply!If the content is replaced with a dance sequence and the style is changed to various dance styles such as ethnic dance, hip-hop, etc., does it meet the input requirements of the denoising module?
I haven't done similar experiments on dance motions. I think you can try some examples : )
Hello, this is a very good job. I have a question that I hope to receive your answer to.What are the input formats for the three conditions (content, style, trajectory) of the denoising module?Is it entered in text format?