ZcyMonkey / AttT2M

Code of ICCV 2023 paper: "AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism"
https://arxiv.org/abs/2309.00796
Apache License 2.0
37 stars 3 forks source link

Attention matrix shape #6

Open RohollahHS opened 4 weeks ago

RohollahHS commented 4 weeks ago

Great work. Thanks for sharing. In the paper it is said that the shape of the adjacancy matrix is (n+1)*(n+1), which sould be 22,22 (21 for joints and 1 for food contact). However, in your implementation in models.encdec.Encoder_spationl class the matrix's shape is 28,28. Also, after that some random vectors add to the encoding (in Abstract_Transformer class). Can you please clarify on these issues, especially the first one? Thanks image

ZcyMonkey commented 4 weeks ago

Great work. Thanks for sharing. In the paper it is said that the shape of the adjacancy matrix is (n+1)*(n+1), which sould be 22,22 (21 for joints and 1 for food contact). However, in your implementation in models.encdec.Encoder_spationl class the matrix's shape is 28,28. Also, after that some random vectors add to the encoding (in Abstract_Transformer class). Can you please clarify on these issues, especially the first one? Thanks image

As mentioned in our paper, the human body is divided into 5 body parts, which is the "nparts" in your screenshot。 And the extra dimensions in the attention mask are to extract the features of them。