NVlabs / dex-ycb-toolkit

A Python package that provides evaluation and visualization tools for the DexYCB dataset
https://dex-ycb.github.io
GNU General Public License v3.0
145 stars 24 forks source link

Inconsistency in pose_y shape #12

Closed andreaziani closed 2 years ago

andreaziani commented 2 years ago

Hi, I was trying to retrieve the rotation matrices for the objects in a sequence and I notice that the representation is a bit different than what stated in the README.

In the README it's mentioned:

pose_y: A float32 numpy array of shape [num_obj, 3, 4] holding the 6D pose of each object. Each 6D pose is represented by [R; t], where R is the 3x3 rotation matrix and t is the 3x1 translation .

But actually reading pose_y I get a different shape. In particular I get [num_frames, num_obj, 7] . What is this shape representing? Is this using the quaternion representation?

Many thanks in advance!

ychao-nvidia commented 2 years ago

We provide object 6D pose in two formats:

  1. Per image: This is the format described in the README here. Here 6D pose is defined in the camera frame of a given image. Since this format is camera view dependent, the poses are stored in label files under different folders for different camera views, e.g., 20200709-subject-01/20200709_141754/836212060125/labels_000000.npz.
  2. Per sequence: This format is not documented in the README, but is used in the example of visualizing sequences, e.g., see this line. Here 6D pose is defined in a world frame (i.e., we set this to the frame of one of the camera 840412060917), and we store the pose of a full sequence in one file. Since this format is view independent, you'll find only one file that stores this format in each sequence, e.g., 20200709-subject-01/20200709_141754/pose.npz.

It seems that you are loading the pose_y in format 2. As you mentioned, the shape should be [num_frames, num_obj, 7]. Here a single pose of a single object is store in a 7-d vector: rotation in quaternion (x, y, z, w) + translation (x, y, z). You can see how we use this format in here and here.