Open wbhu opened 3 years ago
Dear Wenbo,
Thanks for your feedback. For the discriminator in our network, we just adopt the idea from HMR and replace the input with speed.
I checked their code again. HMR use slim.conv2d which use a default activation_fn parameter rather than pure conv2d and I ignored it. But currently with this linear discriminator, we can still see the improvement in rotation part, that means there are some other reason. I will have more check and commit a new version once ready, also update some experiments in this thread.
Best, Mingyi
Thanks for your quick response.
Hi @Shimingyi Nice work! But I have two additional questions about the joint rotation discriminator which require your further explanation if possible.
Thanks a lot in advance.
Hi @Shimingyi Just a follow-up comment, as you have mentioned in the paper
our discriminator judges the realism of temporal sequences of angular velocities.
However, I think the finite difference of quaternions, which are on a manifold, cannot be used to approximate the angular velocity. This is in contrast to velocity approximation in Euclidean space where finite difference works.
Hi @longbowzhang
I also found this problem in my latest experiments. The rotation from CMU data is normalized but the predicted rotation is not normalized, so the modeling of distribution between these two datasets will confuse the network. We had another experiment which applied Euler angle on discriminater, there is a normalization step and we got the same conclusion.
But because of the ‘Linear’ problem, these conclusions are all not solid so I will clarify all of them in a new commit and update in this thread.
I agree with you, now calling it 'velocity' didn't make sense here, there is no high level meaning on the [q1 - q2]. I will trying something new like to represent it.
Thanks for the feedback very much!
Best, Mingyi
Hi @Shimingyi,
Thanks a lot for your fast reply. I am also curious about the motivation of the Adversarial Rotation Loss section.
Due to the fact that the T-poses for different samples in the dataset are not aligned, namely, two similar poses might be represented by different rotations, thus, a direct loss on the rotations can not be applied, unless, the entire set is retargeted to share the T-pose, which is a time consuming operation. Due to the potential difference between the absolute values of rotations that represent the same pose, our network is trained to output rotations with natural velocities distribution using adversarial training. The idea is to focus on the temporal differences of joint rotations rather than their absolute values.
Looking forward to more dissuasion with you.
Hi @longbowzhang
In the motion capture system, the motion will be represented by initial pose and related rotations. Because there is no standard to describe the initial pose, similar poses will be represented by different rotations. Like this example, it's from CMU dataset and Truebones BVH files bvh file, I set all the rotations to 0, you can see the final poses are different. If we want to make the left one to 'T' pose, we need to apply extra rotations to achieve it.
Regarding the angular velocity, we have some internal discussion already. I agree with you, the difference in manifold space cannot present 'velocity' which works in Euclidean space. We will find another way here, angular velocity is an option. Thanks for your useful suggestion!
Hi @Shimingyi I have a few questions about the discriminator as well,
Hi, @JinchengWang .
I have added the activation layer in the code, and the current pre-model should be fine on the network level. But I haven't updated the experiments on different representation of 'rotation differences', because I am busy in another project. I plan to do it in next month. For the discriminator, there should be two kinds of T-pose in our training data. One is the network prediction which is based on our T-pose and another one is based on the T-pose in CMU dataset. From my last comment in this thread, you can find these two T-poses are different, even inside CMU dataset, the T-pose will be influenced by different bone length setting, so they need to apply some rotation in the first line in the second part of bvh file to get a initial pose. So I will suggest maybe you can run some retargeting methods on rotation dataset, so it can be used as absolute value directly.
Please let me know if there are some questiones : )
Best, Mingyi
Dear authors,
Thanks a lot for the amazing work and sharing the code. Accroding the appendix A in the paper, "discriminator D is a linear component (similarly to Kanazawa et al. [2018]), with an output value between 0 and 1, containing two convolution layers and one fully connected layer". However, as the last reponse in issue of the code for Kanazawa et al. [2018], it dose have activation function.
I'm wondering why a linear discriminator can classify whether a rotation speed is natural or not, as in my point of view, this classification is not trival.
Best, Wenbo