Closed jiyou384 closed 1 year ago
Are you referring to the calibration result from section 2.3.4 of the readme? With the csv titled "calibration_YYYY-MM-DD_HH-MM-SS.csv"? If so, just follow section 2.4 to get the final calibration parameters.
To go from roll, pitch, yaw, x, y, z to a transformation matrix, we first need to convert roll, pitch, yaw into a rotation matrix with rotation order XYZ.
Once you have the resulting 3x3 Rotation matrix from above, you can combine it with the, translation (x, y, z) in a single 4x4 homogenous transformation matrix T as shown below. This can be then used to convert points from camera to lidar frame.
There are different libraries for the above to help with rotation matrix/transformation like Eigen in C++ or scipy.spatial.transform.Rotation in python.
Thanks for your reply.
I'm studying the paper you linked and I have some questions.
I understood that you extract normal vectors by using RANSAC and calculate some errors on lidar coordinates.
However, I want to know how you extract vectors from camera coordinates and how can calculate roll, pitch, yaw, x, y, and z from them.
This can be then used to convert points from camera to lidar frame
I think I have a fundamental misunderstanding here @darrenjkt: Aren't projections from 2D -> 3D tricky, because (from the pinhole perspective) a point in 2d can be projected to any point along it's projection in 3d? The explanation above only projects camera to lidar, but how do I know where this projection and the lidar points intersect?
@jiyou384 To extract vectors from camera coordinates, check out this function: https://github.com/acfr/cam_lidar_calibration/blob/fb8614ef0f95da659351371b62605268b7f7dc8f/src/feature_extractor.cpp#L541
The main idea is that we define the chessboard in it's own 3D object frame (i.e. the chessboard is centred on the origin) using the hand-measured dimensions. In the object frame you can just define the normal vector as a unit vector on the +z axis, centred on the origin. We compute PnP using the 3D-2D correspondence of the hand-measured chessboard and cv::findChessboardCorners
to get the chessboard's pose in the camera's frame. After getting the pose (tvec, rvec), we transform the camera-captured board from it's 3D object frame to the 3D camera frame.
https://github.com/acfr/cam_lidar_calibration/blob/fb8614ef0f95da659351371b62605268b7f7dc8f/src/feature_extractor.cpp#L671
Once the chessboard is in the camera frame, that's when we run the optimisation to minimise the distance between normals, and board centres of the captured chessboard in camera and lidar frame.
@juliangaal Yes in certain cases with insufficient collected chessboard samples, there are many solutions for projecting 2D to 3D and that is why it is important to capture sufficient and good quality samples for calibration. Quality samples are those that have enough variance in their normal vectors as explained in Fig 5 of our paper. I'm not sure if this was exactly what you were asking but hopefully it provides some insight.
Very helpful, thank you!
Yes in certain cases with insufficient collected chessboard samples, there are many solutions for projecting 2D to 3D
That's what I don't get: given the calibration T_L_C = [R t], applying this to a 2d image (T_L_C * img) isn't possible without extending the dimensions to 3d (technically 4D, for homogeneous coordinates). Even if there were a sensible way to do this, T_L_C doesn't include information about where to "cut" the projection vector from 2D to 3D.
Therefore, the only direction the transformation makes sense, is T_LC^-1 (as you do in image projection).
Sorry for the confusion. When I mentioned "there are many solutions for projecting 2D to 3D", I was referring to the fact that the optimization can return different extrinsic transformation matrix as "solutions", but they might not all be the optimal one.
Image to world projection is a separate topic. You can't just apply T_C->L to the 2D image because T_C->L is a 3D transform, and the image pixels are in their own image plane. You need to first transform the pixels from the image plane to the camera frame itself. If you wanted to transform pixels to lidar, you'd need to go from Image Pixels (aka. plane) -> Camera frame (projection centre of your physical camera) -> Lidar frame
To get pixels in 3D coordinates in camera frame, you first define a 3D object (like our chessboard). Then you need to use PnP to get the T_O->C (Transform from object frame to camera frame). Once you have this, you can specify the object in 3D coordinates in the camera frame. From here, you can use the extrinsic transformation matrix you obtain with this repo T_C->L to transform the object's 3D coordinates to the lidar frame.
Closing this because there is no activity. Feel free to open it again.
I got a calibration result data on csv file. I want to use them but I don't know how to use them. Is there is an equation that I can cat tranform matrix from camera to lidar?