PKU-EPIC / UniDexGrasp

Official code for "UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy" (CVPR 2023)
136 stars 13 forks source link

Questions about the code in dex_dataset.py. #5

Closed wyl2077 closed 1 year ago

wyl2077 commented 1 year ago

Hi, I'm confused about some codes in "dexgrasp_generation/datasets/dex_dataset.py" and I hope to get some explanations. First, in line 125, this appears to be a 1x4 vector. What does it correspond to? plane = recorded_data["plane"] Second, in line 152-157, pcs_table and pose_matrices appear to be 100x400x3 and 100x4x4 vectors. What 100 means? obj_pc_path = pjoin(self.root_path, "DFCData", "mesh", category, instance_no, "pcs_table.npy") pose_path = pjoin(self.root_path, "DFCData", "mesh", category, instance_no, "poses.npy") pcs_table = torch.tensor(np.load(obj_pc_path, allow_pickle=True), dtype=torch.float) pose_matrices = torch.tensor(np.load(pose_path, allow_pickle=True), dtype=torch.float) Third, in line 132, this seems to use the distance between two vectors to retrieve an index from the object's pose, in order to select an object pose for grasping. What is the reason for doing this? index = (torch.tensor(plane[:3], dtype=torch.float) - pose_matrices[:, 2, :3]).norm(dim=1).argmin() Finally, there are 100 object poses and about 200 grasp poses in the datasets, I am confused about the code that establishes their correspondence. Best wishes.

mzhmxzh commented 1 year ago

To generate random scenes with an object on top of a table, we drop every object onto the table 100 times randomly. This will generate 100 affine transform matrices for each objects, which are recorded in poses.npy. When you apply a matrix to some object, the object will be rotated and translated so that it lies stably on the z=0 plane. pcs_table.npy records the 100 table-top scenes in the form of point clouds, with the first 3000 points sampled from the object, and the last 1000 points from the table using furthest point sampling. You can visualize the point cloud in this script, change --num to an integer in [0, 100) to visualize different poses in our object dataset.

However, when we generate the table-top grasp dataset, we only use the table plane rather than the entire transform matrix. That is to say, for each grasp, we first randomly sample a transform matrix from poses.npy, then extract the table plane Ax + By + Cz + D = 0 in the object reference frame. You can verify that (A, B, C) == pose_matrix[2, :3] and D == pose_matrix[2, 3] * object_scale. As a result, in order to retrieve the correspondence between a grasp and a pose matrix, we need to find the matrix with the smallest |(A, B, C) - pose_matrix[2, :3]| in poses.npy. Use this script to visualize a grasp.

In retrospect, there could be better ways to do this, but the code was running fine so we didn't find it necessary to change it...

wyl2077 commented 1 year ago

Thank you very much. So, the 'plane' in grasp file (e.g. "pose/core/bottle-asdasfja12jaios9012/00000.npz") is the table plane of the correspondence object pose. Is that correct?

mzhmxzh commented 1 year ago

Yes.