dougsm / ggcnn

Generative Grasping CNN from "Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach" (RSS 2018)
BSD 3-Clause "New" or "Revised" License
499 stars 140 forks source link

How calculated robot grasp(g') and image grasp(g) relation g'=T(RC)*(T(CI)*g). #31

Closed priyashkla closed 4 years ago

priyashkla commented 4 years ago

If all elements represents matrices then what was the dimension of each elements. as T(CI) is 3*3 by default can you detail how you used that procedure to get transformation between robot and camera?

dougsm commented 4 years ago

Hi @priyashkla That wasn't meant to be a literal matrix equation. In theory there's a number of ways you can do this, and depends what information you have.
Normally you would do it in two steps: 1) Inverse of the 3x3 projection matrix (K) and measured depth (z) to get camera coordinates image

2) Get world (robot) coordinates using the appropriate 4x4 transform matrix (T) between camera and robot: image

The intrinsic matrix and camera extrinsic matrix can also both be represented as 4x4 homogenous transforms, or a single 4x4 camera matrix which allow you to do this in a single step.

Practically speaking, what we actually did here for our ROS implementation was:

  1. Compute the position of the point in the frame (this is equivalent to the inverse of K, but hardcoded): https://github.com/dougsm/mvp_grasp/blob/master/ggcnn/ros_nodes/ggcnn_predict_rt.py#L95

  2. Use tf to compute and apply the transform from the camera frame to the robot frame automatically based on our URDF: https://github.com/dougsm/mvp_grasp/blob/master/dougsm_helpers/src/dougsm_helpers/tf_helpers.py#L35

This article also discusses the inverse projection problem quite well: https://towardsdatascience.com/inverse-projection-transformation-c866ccedef1c

Hope that helps.

priyashkla commented 4 years ago

For transformation from g to g' , you included position only ? If no then how parameters have been used during transformation?position mapping is linear but orientation mapping is highly non linear because for any specific pose more than one orientation can give same result

dougsm commented 4 years ago

Once you have the x, y, z component in the camera frame, you can copmute a full pose including rotation in the frame of the camera by using the grasp angle as the yaw angle , like we do here (also includes some angle wrapping), in thise case as a quaternion in the pose. https://github.com/dougsm/mvp_grasp/blob/master/ggcnn/ros_nodes/ggcnn_service.py#L109

If you want to think about this in terms of matrices, then the two position vectors in step 2 above would just become 4x4 pose matrices and the maths is the same. In practice though, we just represent thse a ROS Pose type and convert using tf as I mentioned above.