MohamedWZS commented 1 year ago

The solvePnP function in OpenCV estimates the pose (the rotation and translation) of a camera given a set of 3D points in the world coordinate system and their corresponding 2D projections in the image plane. This problem is known as the Perspective-n-Point (PnP) problem.

The PnP problem can be solved using various methods, and OpenCV defaults to the iterative Levenberg-Marquardt optimization algorithm. Other methods available in OpenCV include P3P, EPnP, and DLS. Each method has different assumptions and trade-offs in terms of accuracy, speed, and the minimum number of point correspondences required.

The logic behind the solvePnP function can be summarized as follows:

Input: The function takes as input a set of 3D points in the world coordinate system, their corresponding 2D projections in the image plane, the intrinsic camera matrix (including focal length and principal point), and an optional distortion coefficients vector.

Estimate camera pose: The function estimates the rotation vector (rvec) and translation vector (tvec) that describe the transformation from the world coordinate system to the camera coordinate system. This is done by minimizing the re-projection error, which is the difference between the observed 2D points and the 2D points obtained by projecting the 3D points using the estimated camera pose.

Output: The function returns the estimated rotation and translation vectors. You can convert the rotation vector to a rotation matrix using the cv2.Rodrigues function if needed.

The estimated camera pose can be used to understand the position and orientation of the camera in the world coordinate system or to project additional 3D points onto the image plane. It is an essential component in many computer vision and robotics tasks, such as 3D reconstruction, augmented reality, and visual odometry.

MohamedWZS commented 1 year ago

In linear algebra, a rotation matrix is a square matrix that is used to perform rotations in n-dimensional Euclidean space. A rotation matrix is an orthogonal matrix, meaning that its columns and rows are orthonormal vectors (i.e., they have length 1 and are perpendicular to each other).

To check if a given matrix R is a coherent rotation matrix, we need to verify two conditions:

R is a square matrix with the same number of rows and columns. The determinant of R is equal to 1.0, within a certain tolerance (specified by the epsilon parameter in the code). The determinant of an orthogonal matrix is either 1 or -1, so the condition that det(R) = 1 ensures that R represents a proper rotation.

The function check_coherent_rotation(R: np.ndarray) -> bool takes a numpy array R as input and returns True if it is a coherent rotation matrix, or False otherwise. It uses the numpy linalg.det() function to compute the determinant of R, and compares it to 1.0 within the tolerance specified by epsilon. If the absolute difference between det(R) and 1.0 is less than or equal to epsilon, the function returns True, indicating that R is a coherent rotation matrix.

ZiadMansourM commented 1 year ago

In the given C++ code, the 3D-2D correspondences are found by iterating through the good views (already used views), and for each view, the code checks if there are matches between the working view and the old view. If there is a match, it then iterates through the 3D point cloud to find the corresponding 3D point to the 2D point in the old view. This is done by comparing the index of the 2D point in the old view with the indices stored in the index_of_2d_origin attribute of the CloudPoint struct.

Here's an example of how you can find 3D-2D correspondences in Python within the generate_point_cloud_general function:

def find_3d_2d_correspondences(pcloud, good_views, img_keypoints, matches_matrix):
    img_points = []
    ppcloud = []

    for old_view in good_views:
        matches_from_old_to_working = matches_matrix[(old_view, working_view)]

        for match in matches_from_old_to_working:
            idx_in_old_view = match.queryIdx

            for pcldp, cloud_point in enumerate(pcloud):
                if idx_in_old_view == cloud_point.index_of_2d_origin[old_view]:
                    ppcloud.append(cloud_point.pt)
                    pt_ = img_keypoints[working_view][match.trainIdx].pt
                    img_points.append(pt_)
                    break

    return ppcloud, img_points

You can call this function inside the generate_point_cloud_general function before running solvePnPRansac.

To find the camera pose using the 3D-2D correspondences, the code uses the solvePnPRansac function, which takes the 3D points, 2D points, camera matrix K, and distortion coefficients as input and returns the rotation and translation vectors. The rotation vector is then converted to a 3x3 rotation matrix using the Rodrigues function. Finally, the projection matrix P1 is constructed using the rotation matrix and the translation vector.

In the given C++ and Python code, the reference image is chosen based on the good views that have already been processed. The reference image is the one with the most inliers or the best quality matches.

If you have images [1, 2, 3, 4] and image 1 is the reference image, you will need to find 3D-2D correspondences for each image to estimate the camera poses for each image relative to the reference image. In this case, image 1 would be the base view, and you would find correspondences between image 1 and images 2, 3, and 4. Then, use solvePnPRansac to estimate the camera poses.

Here's an example of using solvePnPRansac in the Python code:

retval, rvec, tvec, inliers = cv2.solvePnPRansac(ppcloud, img_points, K, dist_coeffs)

# Convert the rotation vector to a rotation matrix
R, _ = cv2.Rodrigues(rvec)

# Construct the projection matrix P1
P1 = np.hstack((R, tvec))

You should incorporate this code into the generate_point_cloud_general function after finding the 3D-2D correspondences.

ZiadMansourM commented 1 year ago

Important

import cv2
import numpy as np

def find_3d_2d_correspondences(pcloud, good_views, img_keypoints, matches_matrix, working_view):
    img_points = []
    ppcloud = []
    for old_view in good_views:
        matches_from_old_to_working = matches_matrix[(old_view, working_view)]
        for match in matches_from_old_to_working:
            idx_in_old_view = match.queryIdx
            for cloud_point in pcloud:
                if idx_in_old_view == cloud_point.index_of_2d_origin[old_view]:
                    ppcloud.append(cloud_point.pt)
                    pt_ = img_keypoints[working_view][match.trainIdx].pt
                    img_points.append(pt_)
                    break
    return ppcloud, img_points
def generate_point_cloud_general(images: Images, K_matrix: np.ndarray, dist_coeffs: np.ndarray, **kwargs) -> np.ndarray:
    point_cloud = []
    good_views = set()
    for i in range(len(images.images) - 1): # "0", 1, 2, 3 -> (0, 1) and (0, 2) and (0, 3)
        img1 = images.images[i]
        img2 = images.images[i + 1]
        keypoints1 = np.array([img1.keypoints[m.queryIdx].pt for m in images.feature_matches[i].matches])
        keypoints2 = np.array([img2.keypoints[m.trainIdx].pt for m in images.feature_matches[i].matches])
        P2 = find_camera_matrices(K_matrix, keypoints1, keypoints2, images.feature_matches[i].matches)
        if P2 is None:
            continue
        if i == 0: # Intial Structure
            P1 = K_matrix @ np.hstack((np.eye(3), np.zeros((3, 1))))
            P2 = K_matrix @ P2
            points_4D: list[OpenCV.points4D] = cv2.triangulatePoints(P1, P2, keypoints1.T, keypoints2.T)
            points_3D: list[OpenCV.points3D] = (points_4D / points_4D[3])[:3]
            point_cloud.append(points_3D)
            good_views.add(i)
            good_views.add(i + 1)
        if i > 0:
            working_view = i + 1
            ppcloud, img_points = find_3d_2d_correspondences(point_cloud[-1], good_views, [img.keypoints for img in images.images], images.matches_matrix, working_view)
            retval, rvec, tvec, inliers = cv2.solvePnPRansac(ppcloud, img_points, K_matrix, dist_coeffs)
            R, _ = cv2.Rodrigues(rvec)
            P1 = np.hstack((R, tvec))
            P2 = K_matrix @ P1
            points_4D = cv2.triangulatePoints(K_matrix @ np.eye(3, 4), P2, keypoints1.T, keypoints2.T)
            points_3D = (points_4D / points_4D[3])[:3]
            point_cloud.append(points_3D)
            good_views.add(working_view)
    return np.hstack(point_cloud).T

ZiadMansourM commented 1 year ago

import numpy as np
import cv2 as OpenCV
from typing import List, Tuple

def check_coherent_rotation(R: np.ndarray) -> bool:
    return np.abs(np.linalg.det(R) - 1.0) <= 1e-6

def find_camera_matrices(K: np.ndarray, keypoints_one: np.ndarray, keypoints_two: np.ndarray, matches: List) -> Tuple[np.ndarray, np.ndarray]:
    E, mask = OpenCV.findEssentialMat(keypoints_one, keypoints_two, K, method=OpenCV.RANSAC, prob=0.999, threshold=1.0)
    _, R, t, _ = OpenCV.recoverPose(E, keypoints_one, keypoints_two, K)
    # Check if the resulting rotation is coherent
    return (None, None) if not check_coherent_rotation(R) else (R, t)

def generate_point_cloud_general(images: Images, K_matrix: np.ndarray, **kwargs) -> np.ndarray:
    point_cloud = []
    P1 = K_matrix @ np.hstack((np.eye(3), np.zeros((3, 1))))
    for feature_match in images.feature_matches: # "0", 1, 2, 3 -> (0, 1) and (0, 2) and (0, 3)
        image_one = feature_match.image_one
        image_two = feature_match.image_two
        # Extract matched keypoints
        keypoints_one = np.array([image_one.keypoints[m.queryIdx].pt for m in feature_match.matches])
        keypoints_two = np.array([image_two.keypoints[m.trainIdx].pt for m in feature_match.matches])
        # Find Camera Matrices
        P2 = find_camera_matrices(K_matrix, keypoints_one, keypoints_two, feature_match.matches)
        if P2 is None:
            continue
        if feature_match == images.feature_matches[0]: # Intial Structure
            P2 = K_matrix @ P2
        else:
            ppcloud, img_points = find_3d_2d_correspondences(point_cloud[-1], [image_one, image_two], [img.keypoints for img in images.images], images.matches_matrix, image_two)
            retval, rvec, tvec, inliers = OpenCV.solvePnPRansac(ppcloud, img_points, K_matrix, None)
            R, _ = OpenCV.Rodrigues(rvec)
            P2 = K_matrix @ np.hstack((R, tvec))
        points_4D = OpenCV.triangulatePoints(P1, P2, keypoints_one.T, keypoints_two.T)
        points_3D = (points_4D / points_4D[3])[:3]
        point_cloud.append(points_3D)
    return np.hstack(point_cloud).T

ZiadMansourM / photogrammetry

Reconstruction from many views #35

Important