Closed priyashkla closed 4 years ago
Hi @priyashkla
That wasn't meant to be a literal matrix equation. In theory there's a number of ways you can do this, and depends what information you have.
Normally you would do it in two steps:
1) Inverse of the 3x3 projection matrix (K) and measured depth (z) to get camera coordinates
2) Get world (robot) coordinates using the appropriate 4x4 transform matrix (T) between camera and robot:
The intrinsic matrix and camera extrinsic matrix can also both be represented as 4x4 homogenous transforms, or a single 4x4 camera matrix which allow you to do this in a single step.
Practically speaking, what we actually did here for our ROS implementation was:
Compute the position of the point in the frame (this is equivalent to the inverse of K, but hardcoded): https://github.com/dougsm/mvp_grasp/blob/master/ggcnn/ros_nodes/ggcnn_predict_rt.py#L95
Use tf to compute and apply the transform from the camera frame to the robot frame automatically based on our URDF: https://github.com/dougsm/mvp_grasp/blob/master/dougsm_helpers/src/dougsm_helpers/tf_helpers.py#L35
This article also discusses the inverse projection problem quite well: https://towardsdatascience.com/inverse-projection-transformation-c866ccedef1c
Hope that helps.
For transformation from g to g' , you included position only ? If no then how parameters have been used during transformation?position mapping is linear but orientation mapping is highly non linear because for any specific pose more than one orientation can give same result
Once you have the x, y, z component in the camera frame, you can copmute a full pose including rotation in the frame of the camera by using the grasp angle as the yaw angle , like we do here (also includes some angle wrapping), in thise case as a quaternion in the pose. https://github.com/dougsm/mvp_grasp/blob/master/ggcnn/ros_nodes/ggcnn_service.py#L109
If you want to think about this in terms of matrices, then the two position vectors in step 2 above would just become 4x4 pose matrices and the maths is the same. In practice though, we just represent thse a ROS Pose type and convert using tf as I mentioned above.
If all elements represents matrices then what was the dimension of each elements. as T(CI) is 3*3 by default can you detail how you used that procedure to get transformation between robot and camera?