NVlabs / dex-ycb-toolkit

A Python package that provides evaluation and visualization tools for the DexYCB dataset
https://dex-ycb.github.io
GNU General Public License v3.0
153 stars 25 forks source link

I can't align vertices with other views by using extrinsic parameter. #6

Open dae-sun opened 3 years ago

dae-sun commented 3 years ago

Hello, I am using your dataset well. thank you for your great works.

I'm having a hard time aligning vertices with other views.

I rotated and translated hand vertices (using mano output, camera coordinate) using inverse of extrinsic matrix. Then, I multiplied vertices (world coordinate) with other view's extrinsic and intrinsic parameter. But it is not aligned with image. I think extrinsic parameter is wrong because when I just align vertices on image using intrinsic, it was aligned well.

Could you give me any advise on this problem? Thank you.

ychao-nvidia commented 3 years ago

Can you share a code snippet for reproducing the problem?

dae-sun commented 3 years ago

Thanks for commenting! I added the code I used for test!(First and Second)

##First, Test
I tested that each view's 3D hand vertices can be converted to same world coordinate..
I changed your code, and below is visualize_pose.py and some fix for testing. 
Test view : 20200709-subject-01 >> 20200709_141754 >> 836212060125 and 839512060362's color_000070.jpg
And, sample['extrinsic'] is from 20200709_141754's meta.yml >> extrinsics: '20200702_151821' >> DEXYCBroot/calibration/extrinsics_20200702_151821/extrinsics.yml >> '836212060125' and '839512060362' value and .reshape(3,4)

def create_scene(sample, obj_file):
  # Load poses.
  label = np.load(sample['label_file'])
  pose_y = label['pose_y']
  pose_m = label['pose_m']

  # Load MANO layer.
  mano_layer = ManoLayer(flat_hand_mean=False,
                         ncomps=45,
                         side=sample['mano_side'],
                         mano_root='manopth/mano/models',
                         use_pca=True)
  faces = mano_layer.th_faces.numpy()
  betas = torch.tensor(sample['mano_betas'], dtype=torch.float32).unsqueeze(0)

  # Add MANO meshes.
  if not np.all(pose_m == 0.0):
    pose = torch.from_numpy(pose_m)
    vert, _ = mano_layer(pose[:, 0:48], betas, pose[:, 48:51])
    vert /= 1000
    vert = vert.view(778, 3)
    vert = vert.numpy()
    vert[:, 1] *= -1
    vert[:, 2] *= -1
    mesh = trimesh.Trimesh(vertices=vert, faces=faces)
  return mesh

def make_world_verts(idx,dataset):
    sample = dataset[idx]
    ext_inv = np.linalg.inv(np.concatenate((sample['extrinsic'],np.array([[0,0,0,1]]))))
    mesh1 = create_scene(sample,dataset.obj_file) # return mesh (Trimesh.mesh)
    mesh1_verts = mesh1.vertices

    gl_ext = np.array([[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,1]])
    mesh1_verts = np.dot(gl_ext,conv2hom(mesh1_verts).transpose()).transpose() # gl->cv   

    #mesh1_verts_hom = np.concatenate((mesh1_verts,np.ones((mesh1_verts.shape[0],1))),axis=1)
    mesh1_verts_W = np.dot(ext_inv,mesh1_verts.transpose()).transpose()
    print(mesh1_verts_W)
    return mesh1_verts_W

def main():
  name = 's0_train'
  dataset = get_dataset(name)

  idx = 142  # 836212060125's color_000070.jpg
  make_world_verts(idx,dataset) # camera to world coordinate of 836212060125's color_000070.jpg
  idx=70 # 839512060362's color_000070.jpg
  make_world_verts(idx,dataset) # camera to world coordinate of 839512060362's color_000070.jpg

=================================================================================================

##Second Test
I tested that first view(20200709-subject-01/20200709_141754/836212060125/color000070.jpg)'s 3D hand pose is aligned to second view's(839512060362/color000070.jpg) 
1. define second view's 3D hand vertices
2. first view's 3D hand vertices(camera coordinate) to world coordinate multiplying inverse of extrinsic.
3. first view's 3D hand vertices(world coordinate) to second view's camera coordinate multiplying inverse of extrinsic(first view's)

def make_world_verts2(idx, dataset):
    sample = dataset[idx]
    #print(sample142['extrinsic'])
    ext_inv = np.linalg.inv(np.concatenate((sample['extrinsic'],np.array([[0,0,0,1]]))))
    mesh1 = create_scene(sample,dataset.obj_file)
    #print(mesh1.vertices)
    verts = mesh1.vertices

    gl_ext = np.array([[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,1]])
    verts_W = np.dot(gl_ext,verts).transpose() # 778X4
    verts_W = np.dot(ext_inv,conv2hom(verts_W).transpose()) #4x778

    return verts_W # 778X4

def main():
  name = 's0_train'
  dataset = get_dataset(name)

    # define second view's vertices
    idx=142
    sample142 = dataset[idx]
    mesh142 = create_scene(sample142,dataset.obj_file) #return mesh (trimesh.Trimesh)
    verts142 = mesh142.vertices

    verts70W = make_world_verts2(70,dataset) # 778X4
    ext142 = np.concatenate((sample142['extrinsic'],np.array([[0,0,0,1]])))
    new_verts70 = np.dot(ext142,verts70W.transpose()).transpose()  # 778X4
    # new_verts70 != verts142

Thank you!

ychao-nvidia commented 3 years ago

You should not need np.linalg.inv().

For the first test, remove np.linalg.inv(). For the second test, remove np.linalg.inv() in make_world_verts2() and add np.linalg.inv() when you compute ext142.

dae-sun commented 3 years ago

I really thank you for your kindness. Thanks for answering.

I understood that your given extrinsic(in calibration/../extrinsics.yml file) is actually camera position and rotation matrix, right? (because, you said that it doesn't need the operation, matrix inverse). But, unfortunately, it didn't work correctly(In other worlds, it didn't have same world coordinate)

Below is the first test result(print(mesh1_verts_W)

/data/DEXYCB/20200709-subject-01/20200709_141754/839512060362 's world coordinate.

[[-802.35969127 -62.0132967 442.31720594 1. ] [-838.34698969 -25.47671269 442.37970725 1. ] [-837.78360669 -57.30155944 411.22202183 1. ] ... [-864.57584629 -30.85454634 411.46818737 1. ] [-814.63290217 -80.79197415 405.59349164 1. ] [-810.68602677 -88.19412168 402.65407566 1. ]]

/data/DEXYCB/20200709-subject-01/20200709_141754/836212060125 's world coordinate

[[ 334.16168769 -438.36301805 634.13413141 1. ] [ 341.12352008 -456.4914259 608.85484625 1. ] [ 383.24745583 -406.4962968 660.94415971 1. ] ... [ 395.83247708 -429.68515202 628.41959933 1. ] [ 333.64957476 -447.27611143 612.79727722 1. ] [ 323.20377585 -418.26912768 643.0397179 1. ]] + I'm wondering that my logic for vertex alignment from camera1's view to camera2's view T'{W->C2}*T{C1->W}*V{c1} V{c1} : camera1's vertices (camera coordinate) T{C1->W} : camera1's inverse matrix of extrinsic T'{W->C2} : camera2's extrinsic + If you don't mind, is it okay to send my full code(trivial fix on your code) to reproduce my idea??... :) + I really thank you again!

Best regards

se122811 commented 3 years ago

To Author,

I have same issues, unfortunetly :(
I'm wating for your reply!

ychao-nvidia commented 3 years ago

You can look at the examples for Interactive 3D Viewer. Specifically, look at this line where the point clouds from different cameras are transformed to the same coordinates using the extrinsics.

pyxploiter commented 1 year ago

@se122811 @dae-sun were you able to resolve the issue? I am also having same problem.

pyxploiter commented 1 year ago

I have found the solution. It's weird but you have to do following for moving from one view to other as rightly pointed out by @ychao-nvidia:

correct output: joints_view2 = inverse(extrinsic_view2) @ (extrinsic_view1 @ joints_view1)

Although I was expecting it to be:

incorrect output: joints_view2 = extrinsic_view2 @ (inverse(extrinsic_view1) @ joints_view1)

But this doesn't work for whatever reason.