facebookresearch / banmo

BANMo Building Animatable 3D Neural Models from Many Casual Videos
Other
537 stars 59 forks source link

Canonical embeddings matching #40

Closed Xin-97 closed 1 year ago

Xin-97 commented 1 year ago

Hi, thanks for your impressive work.

I am trying to understand your 2d-2d matching part. However, when I print the learned matching matrix (prob_vol), it seems all the elements are very close. As I know, it may mean that Banmo does not learn a matching between 2d and 3d features. So what does this part really do? Thanks!

gengshan-y commented 1 year ago

Hi, I'm assuming you're running the code for 2D-3D feature matching here.

We use the cost volume to compute an expected 3d match, instead of using each matching hypothesis individually. This corresponds to eq (15) in the paper.

The expected 3D match is saved in tmp/match_line_pred.obj. That should look reasonable, if it doesn't please let me know.

Xin-97 commented 1 year ago

Hi,

My question is about the feature matching distribution (prob_vol in code) as shown in the following in Eq.15, image As I understand, when we can learn a good matching between 2d and 3d embeddings, this matrix should tend to have a dominant element in each row and each column. However, when I try to print it in the code, all elements are very close. So I am very confused about that.

gengshan-y commented 1 year ago

This is not expected. Here is what I got:

slice=prob_vol[100].cpu().numpy()
print(-np.sort(-slice)[:10]) # largest 10 values
print(-np.sort(-slice)[-10:]) # smallest 10 values
array([0.01693168, 0.01188052, 0.01177971, 0.01088822, 0.01021699,
       0.00911641, 0.00887904, 0.00791583, 0.00762484, 0.00664771],
      dtype=float32)

array([5.4839586e-14, 5.3232704e-14, 2.3381782e-14, 9.0445856e-15,
       7.8698915e-15, 6.0068172e-15, 3.0369735e-15, 2.0125922e-15,
       2.2763142e-16, 2.2049661e-17], dtype=float32)

If you give more details of what data / command you use, I can take a further look.

Xin-97 commented 1 year ago

Hi, I run the template.sh on AMA-female data and print the matrix during training. The results are sometimes very close and sometimes very similar to yours. When close, the biggest number is about 10^-4, and the smallest number is about 10^-5.

gengshan-y commented 1 year ago

It could be that sometimes the sampled pixel happened to be belonging to the background, in which case, you may see a flat matching distribution.

Xin-97 commented 1 year ago

Thanks for your reply!!!