Factor graph for 3D pose detection of an object for manipulation

AllenIsaacJose commented 1 year ago

Input to the factor graph
1. Pose of the object and the associated uncertainty.
2. Camera pose
Output of the factor graph
1. 6D pose of the object. (centroid - x,y,z & radius - a,b,c & orientation - roll,pitch,yaw)

AllenIsaacJose commented 1 year ago

3 camera poses
x,y,z,roll, pitch, yaw of the single object in the 3 different camera poses
Output
single object x,y,z, roll, pitch, yaw values.

Requirement

Need to represent camera poses as gtsam.Pose3
Need to represent object landmarks as 3D similar to gtsam.Pose3 representation
Need to represent the constraint between the object landmark and the camera pose. (current available constraint is gtsam.BearingFactor3D which only consider the object landmark as a point with x,y,z. No orientation factors are available.

AllenIsaacJose commented 1 year ago

Camera can be represented as gtsam.Point3 -> x,y,z only.(The data is available from input)
since we get the object centroid x,y,z and the orientation roll, pitch, yaw, they can be represented in the world frame as gtsam.Pose3 -> x,y,z, roll, pitch, yaw.(The data is available from input)
We reverse the experiment such that we are modelling that the object is viewing the camera(represented as a point gtsam.Point3).
There is 0% uncertainty about camera position and high uncertainty in object pose.
Since in our modelled experiment, object is modelled as camera, the object is moving and the noise in gtsam.BetweenFactorPose3() can be set to high.
Whereas, the camera is modelled as an object, the camera is a fixed landmark and the measurement noise in gtsam.BearingFactor3D() can be set 0.
This means that camera position is fixed and cannot be moved. Whereas the object pose can move around and the factor graph produces the object's gtsam.Pose3() matrix from which we can get the 6DOF and from the uncertainty matrix, we can obtain the size of the ellipsoid to contain the object.
In this approach, QuadricSLAM is not needed as there are no 2D bounding boxes. Just normal GTSAM is enough.

Not sure if we can use gtsam.Pose3 instead of gtsam.Point3 in gtsam.BearingFactor3D().
Even if we use gtsam.Pose3, we only represent the its centroid or point feature information in gtsam.BearingFactor3D(). That is, the orientation factors of the object pose won't be optimized, only position factors will be optimized

The 2D bounding box is computed using object centroid and the pose information along with PCA to fit the object within the box.
Then this problem is modelled as a normal QuadricSLAM approach where the camera pose and the current 2D bounding box of the object can be passed into the SLAM.