Angle definition in training and network inference

caiobarrosv2 commented 4 years ago

Dear Doug,

First of all, congratulations for this awesome work and all the research you have been focusing including the MVP and EGAD dataset. I'm following your brilliant steps and I'm highly inspired by your grasping methods.

Can you help me with this...?

Question 1) Why do you define the grasp as a vector component sin(2\theta) and cos(2\theta)? I known that is proposed by Hara et al. (2017) as a way to facilitate the network training but I didn't understand the argument 2*\theta inside sin and cos.

You do this in this part of the code: https://github.com/dougsm/ggcnn/blob/f10fc95d081ae723cce18779621096811eb62d6e/utils/data/grasp_data.py#L91-L92

Question 2) In your paper, you define the grasp angle as the equation bellow (Fig. 1). Howerver, if we calculate the grasp angle by using this equation, you get values between -0,785 and 0,785 rad (am I missing something?).

Despite that, you do not consider the sin(2\theta) and cos(2\theta) when predicting the grasp angle in code: https://github.com/dougsm/ggcnn_kinova_grasping/blob/004139fedd5ad304f36de76b43466b4474b2081b/ggcnn_kinova_grasping/scripts/run_ggcnn.py#L126 Should I consider the grasp angle equation as the following (without 2 in cos and sin argument)?

Would this be

Figure 1:

dougsm commented 4 years ago

hey @peterschnitman, thanks for the detailed question :-)

Q1) So as you know, encoding the angle as the vector components is a trick to facilitate the training. This works because it turns the discontinuous value (the angle) into two continuous values (the cos and sin of the angle). This means that the network doesn't have to learn that an angle of 3.13 and -3.13 are close to each other (assuming angles normalised to the range in the range [-pi, pi]), because the (sin, cos) components are (0.01, -0.99) and (-0.01, -0.99). This looks like: Figure_3

The important part is that the representation is the same at both end of the graph (+/-pi), reflecting the circular nature.

The problem with the grasp is that it is symmetrical around +/- pi/2, rather than at +/-pi. That is, for two fingers, a grasp with theta pi/2 is the same as a grasp with theta -pi/2. In this case, if you use the sin/cos trick to encode the angles, you would get the following representation: Figure_1

Note that the sin component is +1 for theta = pi/2, and -1 for theta = -pi/2. This means that there are two different values for the exact same grasp (+/- pi/2). It also means that two grasps that are almost identical, say theta = +/- 1.5 have two completely different representations. This would be almost impossible to train, since following the gradient would take you further away from the desired grasp rather than closer.

If you encode with 2*\theta, you get the following: Figure_2

And so we're back to a representatithat is continuous around the points [-pi/2, pi/2], and reflects their circular nature in the context of grasping.

Question 2) I think this might be just a result of lazy coding and documentation on my part. In the line you linked, the outputs of the network cos_out and sin_out are the raw output of the network, which has learnt the 2*\theta representation from the above graph, so the values contained within are actually cos(2\theta) and sin(2\theta) despite the misleading names, which matches the expected arctan(sin(2\theta),cos(2\theta)) that you're expecting. Sorry for the confusion.

caiobarrosv2 commented 4 years ago

Thank you sou much, Doug.

You really made it easy to understand. I think that actually I made the confusion hahah

By encoding with 2\theta, the grasp angle is limited to [-pi/2,pi/2] while avoiding discontinuity and we represent the grasp angle -pi/2 and pi/2 with the same (cos, sin) value (-1,0), right?

caiobarrosv2 commented 4 years ago

Thank you :)

dougsm / ggcnn

Angle definition in training and network inference #29