RobotLocomotion / pytorch-dense-correspondence

Code for "Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation"
https://arxiv.org/pdf/1806.08756.pdf
Other
557 stars 133 forks source link

missing L2 norm in match_loss? #211

Closed christian-rauch closed 4 years ago

christian-rauch commented 4 years ago

The function match_loss is supposed to compute the descriptor loss by 1/num_matches * \sum_{matches} ||D(I_a, u_a, I_b, u_b)||_2^2. This is the sum of squares of the L2 norm on the feature vector difference, as per paper L_{matches} eq. 1.

However, the python code only reads:

match_loss = 1.0 / num_matches * (matches_a_descriptors - matches_b_descriptors).pow(2).sum()

Since matches_a_descriptors is a 2D tensor, this would be 1/num_matches * \sum_{matches} \sum_{D} (I_a(u_a) - I_b(u_b))^2.

Am I missing something here? Could you point me to the function that computes the matching descriptor loss as per eq. 1 of the paper?

Was this maybe changed when addressing https://github.com/RobotLocomotion/pytorch-dense-correspondence/issues/46?

Since this is overestimating the matching loss, could this also be the reason for https://github.com/RobotLocomotion/pytorch-dense-correspondence/issues/56?

manuelli commented 4 years ago

The pow function is elementwise (https://pytorch.org/docs/stable/torch.html#torch.pow). So if you have two vectors x, y \in R^D (so x.shape = [D,]) then you have

||x-y||_2^2 = \sum_{k=1}^D (x_k - y_k)^2 = (x-y).pow(2).sum()

If you now add many matches so that x.shape = [N, D] then the logic still holds in that

\sum_n ||x[n] - y[n]||_2^2 = \sum_n \sum_k (x[n][k] - y[n][k])^2 = (x-y).pow(2).sum()

so I think the formula the code is computing exactly what is in the paper. I am not sure what you mean by

Since matches_a_descriptors is a 2D tensor, this would be 1/nummatches * \sum{matches} \sum_{D} (I_a(u_a) - I_b(u_b))^2.

Since I_a(u_a) is a D-dimensional vector so it doesn't make sense to square them

christian-rauch commented 4 years ago

You are right, sorry. I got confused because I initially thought that matches_a_descriptors is a single feature vector.

With:

Since matches_a_descriptors is a 2D tensor, this would be 1/nummatches * \sum{matches} \sum_{D} (I_a(u_a) - I_b(u_b))^2.

Since I_a(u_a) is a D-dimensional vector so it doesn't make sense to square them

I actually meant the sums over matches and D (your n and k) and I_a(u_a) should have read I_a(u_a,d).