ZiadMansourM / photogrammetry

Photogrammetry: Close Range 3D scanning. Our graduation project 🎓
https://docs.scanmate.sreboy.com/
GNU General Public License v3.0
4 stars 0 forks source link

Structure From Motion - SFM #20

Closed ZiadMansourM closed 1 year ago

ZiadMansourM commented 1 year ago

SFM

SfM can be used to create a dense 3D point cloud or mesh of the object by triangulating 3D points from their projections onto multiple 2D images. SfM techniques typically involve estimating camera poses and reconstructing the 3D structure of the scene by triangulating corresponding points in the different images.

In our case, we can use SFM algorithms to reconstruct the 3D structure of the hand-sized object from the 2D images taken from different angles. The output of the SfM algorithm can be a dense point cloud or mesh, which can be saved in the .obj file format or other 3D file formats.

It is worth noting that the quality of the 3D model will depend on the number and quality of the input images, the accuracy of the camera calibration, and the effectiveness of the SFM algorithm used.

Popular SFM implementations

OpenMVS:

OpenMVS is a popular open-source SFM pipeline for dense 3D reconstruction from multiple images. It provides a complete pipeline for dense point cloud and mesh reconstruction, including camera calibration, feature detection and matching, and 3D reconstruction.

OpenCV:

OpenCV is a widely-used open-source computer vision library that includes a number of algorithms for camera calibration, feature detection and matching, and 3D reconstruction. Its SfM module includes functions for 3D reconstruction from multiple images, as well as tools for visualization and post-processing.

COLMAP:

COLMAP is another popular open-source SFM pipeline for 3D reconstruction from multiple images. It includes a number of advanced features, such as support for multi-view stereo and incremental reconstruction, and provides a user-friendly graphical interface.

Bundler:

Bundler is an older, but still widely used SFM implementation that provides a complete pipeline for sparse 3D reconstruction from multiple images. It includes camera calibration, feature detection and matching, and bundle adjustment.

VisualSFM:

VisualSFM is a popular open-source SFM pipeline for 3D reconstruction from multiple images. It includes a number of advanced features, such as support for multi-view stereo and large-scale reconstruction, and provides a user-friendly graphical interface.

Random Notes

The step that comes before SFM (Structure from Motion) is typically image acquisition. In order to create a 3D model using SFM, a series of 2D images of the object or scene of interest must be captured. These images should be of high quality, with good lighting and sufficient overlap between images to allow for accurate feature matching.

Once the images are acquired, the next step is typically camera calibration. This involves determining the intrinsic and extrinsic parameters of the camera(s) used to capture the images. Intrinsic parameters include the focal length, principal point, and lens distortion coefficients, while extrinsic parameters describe the position and orientation of the camera in space relative to the object or scene being imaged.

Camera calibration is an important step because it allows for accurate reconstruction of the 3D scene geometry. Inaccurate calibration can lead to errors in the SFM process and result in a less accurate 3D model.


Camera calibration can be done on images that have been taken previously, as long as certain conditions are met.

In order to perform camera calibration, a set of images with known calibration patterns or features are typically required. These calibration patterns could be a checkerboard or a set of known geometric shapes with known dimensions. The images should be taken with the same camera(s) and under similar conditions as the images used for SFM.

If images with known calibration patterns were not captured during the initial image acquisition, it may be possible to capture these images separately and then use them for camera calibration. However, this would require access to the same camera(s) that were used to capture the original images, as well as a stable setup and controlled lighting conditions to ensure consistency between the calibration images and the original images.

Alternatively, if the camera(s) used for the original image capture are known and their intrinsic parameters are available, it may be possible to use these parameters for SFM without performing additional camera calibration. However, this assumes that the camera(s) have not undergone any changes that would affect their intrinsic parameters (e.g., a change in the lens)


1. Image acquisition:

This is the first step and involves capturing a series of 2D images of the object or scene of interest.

2. Feature Extraction:

This step involves identifying distinctive features in each image, such as corners or edges, that can be used to match points between images.

3. Image Matching:

This step involves matching features between pairs of images to determine the relative position and orientation of the cameras used to capture the images.

4. Feature Matching:

This step involves matching features across multiple images to build a set of correspondences between points in 3D space.

5. Triangulation:

This step involves estimating the 3D coordinates of each point in the scene by triangulating the correspondences between 2D image points.

6. Point Cloud:

This step involves assembling the set of 3D points into a point cloud that represents the shape of the object or scene.

7. A .obj File:

This step involves saving the 3D point cloud as a file, typically in a format such as .obj.


In order to perform camera calibration (which is typically done before SFM), the intrinsic parameters of the camera(s) used to capture the images must be determined. These parameters describe the internal characteristics of the camera, such as the focal length, principal point, and lens distortion.

There are several methods for finding the intrinsic parameters of a camera, including using calibration patterns or geometric shapes with known dimensions. Once the intrinsic parameters have been determined, they can be saved and used in the SFM process to improve the accuracy of the reconstruction.

The intrinsic parameters can be saved as a set of values, typically in a text file or as metadata embedded in the image files themselves. Some SFM software packages may also provide tools for calibrating the camera(s) and automatically saving the intrinsic parameters for use in the SFM process.

❓ Questions

How to find the intrinsic parameters of an iPhone 13 pro camera?

To find the intrinsic parameters of an iPhone 13 Pro camera, you can use a camera calibration process that involves capturing a set of images of a known calibration pattern or geometry with the iPhone camera. This process involves the following steps:

Print or display a calibration pattern or set of geometric shapes with known dimensions. Capture a set of images of the calibration pattern or shapes using the iPhone 13 Pro camera, ensuring that the entire pattern or shapes are visible in each image and that the camera is stationary throughout the capture process. Use camera calibration software, such as OpenCV, MATLAB, or other third-party software packages, to detect and extract the calibration pattern or shapes from the images and calculate the intrinsic parameters of the camera. Save the intrinsic parameters, typically as a set of values in a text file or as metadata embedded in the image files. There are several calibration patterns or shapes that can be used for this process, such as a checkerboard pattern or a set of circles or squares with known dimensions. The specific pattern or shapes used will depend on the calibration software being used and the requirements of the SFM application.

Note that the intrinsic parameters of a camera can change over time due to factors such as wear and tear, changes in temperature, and other environmental factors. It is therefore important to periodically recalibrate the camera to ensure accurate reconstruction results.

Generate a python script to find intrinsic parameters from images immediately after taking them

import cv2
import numpy as np

# Define the dimensions of the calibration pattern or geometry
pattern_size = (9, 6)

# Define the object points of the calibration pattern or geometry
objp = np.zeros((np.prod(pattern_size), 3), np.float32)
objp[:, :2] = np.mgrid[0:pattern_size[0], 0:pattern_size[1]].T.reshape(-1, 2)

# Define the arrays to store object points and image points for all images
objpoints = []
imgpoints = []

# Define the path to the directory containing the images
img_dir = 'images'

# Get a list of the image filenames in the directory
img_names = [f for f in os.listdir(img_dir) if f.endswith('.jpg') or f.endswith('.png')]

# Read each image, detect calibration pattern or geometry, and add object and image points to the arrays
for img_name in img_names:
    # Read the image
    img = cv2.imread(os.path.join(img_dir, img_name))

    # Convert the image to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Find the corners of the calibration pattern or geometry
    ret, corners = cv2.findChessboardCorners(gray, pattern_size, None)

    # If corners are found, add the object and image points to the arrays
    if ret == True:
        objpoints.append(objp)
        imgpoints.append(corners)

        # Draw and display the corners on the image
        img = cv2.drawChessboardCorners(img, pattern_size, corners, ret)
        cv2.imshow('img', img)
        cv2.waitKey(500)

# Perform camera calibration using the object and image points
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

# Print the intrinsic parameters of the camera
print('Camera matrix:\n', mtx)
print('Distortion coefficients:\n', dist)

# Save the intrinsic parameters as a text file
np.savetxt('intrinsics.txt', mtx)

What is your recommendations:

Our application performs close-range 3D scanning for images taken from our mobile app. Then outputs a .obj file. Any suggested modification to those steps:

  1. Image acquisition.
  2. Feature Extraction.
  3. Image Matching.
  4. Feature Matching.
  5. Triangulation.
  6. Point Cloud.
  7. A .obj File.

Your current steps for performing close-range 3D scanning seem reasonable, but there may be modifications or additional steps you can consider depending on the specific requirements and limitations of your application. Here are some suggestions:

Image acquisition: Consider using multiple images taken from different angles to capture a more complete 3D model.

Feature Extraction: Depending on the quality and resolution of your input images, you may need to use more advanced feature extraction techniques such as deep learning-based methods to accurately capture the geometry of the scene.

Image Matching: Ensure that your image matching algorithm can handle occlusions, reflections, and other artifacts that can occur during close-range 3D scanning.

Feature Matching: Consider using more robust feature matching algorithms such as SIFT or SURF to increase the accuracy and speed of feature matching.

Triangulation: Depending on the size of the object being scanned, you may need to consider using multiple cameras to triangulate the positions of the features in 3D space.

Point Cloud: Use point cloud processing techniques to remove noise, filter outliers, and refine the geometry of the 3D model.

File format: Consider using more efficient file formats such as PLY or STL instead of .obj to reduce file size and improve compatibility with other software tools.

Overall, the quality of your 3D model will depend on the accuracy of each step in the scanning pipeline, as well as the specific parameters and algorithms used in each stage. Experimenting with different techniques and optimizing the pipeline for your specific application can help you achieve the best results.

Sample Mock input to the Triangulation Step. The given input is 173 images.

Example of what the input to the triangulation step might look like for 173 images:

Image Set: A collection of 173 input images, each of which is a photograph taken from a slightly different viewpoint.

Feature Extraction Output: A set of detected features in each of the input images, along with their descriptors. The features and descriptors can be stored in a format such as SIFT, SURF, or ORB.

Image Matching Output: A set of matched features between each pair of input images, along with their corresponding homographies or fundamental matrices.

Feature Matching Output: A set of matched features across all input images, along with their corresponding 3D positions in the camera frame.

Camera Calibration Parameters: Intrinsic and extrinsic parameters for each of the cameras used to capture the input images, as well as any distortion coefficients.

With these inputs, the triangulation step can compute the 3D position of each feature point by triangulating the corresponding 2D image points across multiple views. The output of the triangulation step is typically a point cloud, which is a collection of 3D points representing the surface of the scanned object.

ZiadMansourM commented 1 year ago

Triangulation

Given cv2.DMatch objects, keyPoints, descriptors of list of images. How trangulation can be done. SIFT was used.

  1. Image acquisition DONE
  2. Feature Extraction DONE
  3. Image Matching DONE
  4. Feature Matching DONE
  5. Triangulation
  6. Point Cloud
  7. generate .obj File:

To perform triangulation, you need to use the matched keypoints from the feature matching step and the corresponding descriptors for each keypoint. Here are the steps to perform triangulation:

  1. Import the necessary libraries:
    import numpy as np
    import cv2
    from mpl_toolkits.mplot3d import Axes3D
    import matplotlib.pyplot as plt
  2. Load the images and extract the keypoints and descriptors using SIFT:
    
    img1 = cv2.imread('image1.jpg')
    img2 = cv2.imread('image2.jpg')

sift = cv2.xfeatures2d.SIFT_create() keyPoints1, descriptors1 = sift.detectAndCompute(img1, None) keyPoints2, descriptors2 = sift.detectAndCompute(img2, None)

3. Match the keypoints using a feature matching algorithm such as BFMatcher:
```.py
matcher = cv2.BFMatcher()
matches = matcher.match(descriptors1, descriptors2)
  1. Convert the matched keypoints to a numpy array:
    pts1 = np.float32([keyPoints1[m.queryIdx].pt for m in matches]).reshape(-1, 1, 2)
    pts2 = np.float32([keyPoints2[m.trainIdx].pt for m in matches]).reshape(-1, 1, 2)
  2. Use cv2.triangulatePoints() function to perform triangulation:
# Camera projection matrix for image 1
K = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]])
RT1 = np.hstack((np.eye(3), np.zeros((3, 1))))
P1 = np.matmul(K, RT1)

# Camera projection matrix for image 2
RT2 = np.hstack((R, T))
P2 = np.matmul(K, RT2)

# Perform triangulation
points4D = cv2.triangulatePoints(P1, P2, pts1, pts2)
  1. Convert the 4D points to 3D points by dividing the last element by the fourth element:

    points3D = points4D / points4D[3]
  2. Visualize the point cloud using matplotlib:

    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(points3D[0], points3D[1], points3D[2], c='r', marker='o')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    ax.set_zlabel('Z')
    plt.show()
  3. Generate an .obj file to save the point cloud:

    with open('point_cloud.obj', 'w') as f:
    for i in range(points3D.shape[1]):
        f.write('v {} {} {}\n'.format(points3D[0, i], points3D[1, i], points3D[2, I]))

    This will save the point cloud in an .obj file, which can be opened in 3D modeling software.