NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.03k stars 287 forks source link

DOPE uses different coordinate frame for HOPE objects #346

Open rpapallas opened 8 months ago

rpapallas commented 8 months ago

Hello,

Thank you for sharing this work and keep improving it. We use DOPE and recently tried to use a pose refiner like Diff-DOPE. However, it appears that DOPE uses a different coordinate frame for the HOPE objects than what is given in the HOPE dataset.

Please see the two screenshots below for two different objects. In RViz you can see the pose reported by DOPE with respect to the camera. In Meshlab you can see the model as downloaded from the HOPE dataset.

We wonder why would DOPE use a different coordinate frame for the objects than the 3D models? Wasn't DOPE trained over synthetic data from such models?

mustard 1

alphabet_soup

mintar commented 8 months ago

Perhaps @TontonTremblay has a more specific answer, but I guess the short answer is that something went wrong during the mesh export/training phase. Every tool has its own conventions. For example, there's this gem from the Blender documentation:

Blender uses Y Forward, Z Up (since the front view looks along the +Y direction). For example, its common for applications to use Y as the up axis, in that case -Z Forward, Y Up is needed.

For reference, ROS uses X Forward, Z Up, except for camera optical frames, which follow the OpenCV convention of Z Forward, Y Down.

Anyhow, since this sort of thing happens a lot, DOPE has the option to apply a transform to the pose before returning it. From your image, it looks like we want the following:

x_mesh =  z_dope
y_mesh = -x_dope
z_mesh = -y_dope

If I'm not mistaken, that should translate to the following transform:

model_transforms: {
    "AlphabetSoup": [[  0,  0,  1,  0],
                     [ -1,  0,  0,  0],
                     [  0, -1,  0,  0],
                     [  0,  0,  0,  1]],
}

You can add this transform to your config_pose.yaml file here:

https://github.com/NVlabs/Deep_Object_Pose/blob/1655459de50cfcbf01f7d24775f834cab400aa25/config/config_pose.yaml#L138-L144

Then, you should also properly fill in the meshes part of your config, so you can visualize the mesh markers in RViz and see whether you got the right transform. The meshes should be in meters. If they aren't, you either have to rescale them or provide a matching mesh_scales parameter.

Alternatively, you can of course keep the pose that is returned by DOPE (i.e., don't provide a model_transforms parameter)

TontonTremblay commented 8 months ago

diff-dope uses opengl coordinate frame, dope is in opencv. You need to apply the transform that Martin shared. So take the pose from dope, apply the transform, refine with diffdope, then re apply the pose transform.

mdogar commented 8 months ago

Hi Jonathan and Martin. Thanks for your responses. But I am a bit confused. Is this relative transform (the one in Martin's config_pose.yaml file) object specific or constant for all objects? Martin's answer suggests it is object specific (thus we would need to visualize the DOPE coordinate frame for each object, eye-ball the relative transform, and record it in the config_pose.yaml file). Jonathan's answer suggests the relative transform is fixed (thus we would only need this one transform for all objects). Which one is correct? Or am I misunderstanding your responses?

mdogar commented 8 months ago

We are also a bit confused because while Jonathan refers to diff-dope, the question above does not involve diff-dope. Above, We look at the coordinate frame that DOPE is outputting for an object (visualized in rviz) and see that it does not match the coordinate frame in the model of the object in the HOPE database (visualized in meshlab). We just want to know what coordinate frame DOPE is outputting.

TontonTremblay commented 8 months ago

I am sorry @mdogar I have been exchanging messages with @rpapallas about diffdope (I think -- must be similar usernames). DOPE ouputs in opencv which should be directly compatible rviz, but DOPE works with keypoint matching, so depending on the order of the keypoints, you could be introducing a rotation (I have seen it in the past), like 0 is now 2 etc. Depending on this you could get a rotation along some axis or 2. There might be some discrepancies between DOPE / hope / BOP (some of the models are in bop format).

The rule of thumb when dealing with transforms in 3d is to do go slowly and apply one rotation at the time. good luck.

mdogar commented 8 months ago

Thank you! One last question: Do these extra rotations have to be multiples of 90 degrees (as in Martin's response), or can they be any random amount? If the latter, it becomes quite difficult to use DOPE, because it would be outputting unknown rotations of the object (whereas 90 degree multiples can be eye-balled).

TontonTremblay commented 8 months ago

looking at your images above it looks like 1 rotation: rotation y-axis 90 degrees positive for mustard. Probably 2 for the other, I am not sure why there are 2 for the soup can :( this is weird.

mintar commented 8 months ago

Is this relative transform (the one in Martin's config_pose.yaml file) object specific or constant for all objects?

It's object specific.

We are also a bit confused because while Jonathan refers to diff-dope, the question above does not involve diff-dope.

Yep, this is just DOPE.

Above, We look at the coordinate frame that DOPE is outputting for an object (visualized in rviz) and see that it does not match the coordinate frame in the model of the object in the HOPE database (visualized in meshlab). We just want to know what coordinate frame DOPE is outputting.

The result from DOPE should match the coordinate frame of the model. As I wrote above, you can provide DOPE with a path to a mesh file (by filling in the meshes parameter) and visualize the mesh in RViz. The orientation of the visualized mesh should match the real object.

Do these extra rotations have to be multiples of 90 degrees (as in Martin's response), or can they be any random amount?

Always multiples of 90 degrees, because this is caused by a disagreement of what axis in the mesh means what (i.e., which axis is x, y, z).

This is probably what has happened (it happens to me all the time):

  1. A mesh of the object was created (by 3D scanning or whatever).
  2. The mesh was imported into the synthetic image generation pipeline, but somehow the orientations got mixed up (this is where the bug is). Therefore, the generated dataset has the wrong orientations.
  3. DOPE is trained on this faulty dataset.
  4. During inference, DOPE will now return orientations with a fixed rotation error relative to the original mesh.

You can fix this in two ways:

  1. Either make DOPE match the coordinate frame of the model by providing a model_transforms.
  2. Or make the coordinate frame of the model match DOPE by rotating the model.
mdogar commented 8 months ago

Very clear. Thank you Martin!

rpapallas commented 8 months ago

Thank you both, this is now clear. We have been rotating the 3D model to match DOPE, just wanted to make sure we were not doing something fundamentally wrong.