Closed catherineytw closed 4 years ago
Hi @catherineytw,
Sorry for late reply, just saw this issue. Thanks for taking interest in our paper, regarding your questions:
The last two dimensions of the input are global velocity in XY plane (check this line), so we can represent the joint positions in local coordinates (the other dimensions of the input). Since we treat skeleton and view angle as static/time-independent properties, we simply drop out the global velocity (time-dependent values) when encoding the body and view angle.
Basically the same reason as for the first question: the input for motion encoder has two additional channels for global velocity in XY plane. Sorry we didn't address this clearly in the paper.
Thank you!! It is definitely helpful!!
Hi, I'm reading your paper and code very carefully recently, and have to say 'nice work'! But I have several questions about the network architecture.
In the code, I noticed that whenever you encode the body and the view angle, you dropped out the two last rows(axis=1) from the input tensor, and I was confused and was wondering why? At first glance, I thought you put the pelvis joint at the end of the joint tensor, but I couldn't find any evidence, could you be kind to explain it? Below is the code that baffled me: m1 = self.mot_encoder(x1) b2 = self.body_encoder(x2[:, :-2, :]).repeat(1, 1, m1.shape[-1]) v3 = self.view_encoder(x3[:, :-2, :]).repeat(1, 1, m1.shape[-1])
In the common.py, you initialized the network input and output channels as following:
I was wondering why you need two more channels for the motion encoder? Shouldn't it be the same as the body and view channels since the number of joints are the same?
Many thanks!