[question] Projecting Meshroom 3D Mesh Points onto Images

Janudis commented 3 weeks ago

Hello, I'm attempting to project the 3D points from a mesh generated by Meshroom's Texturing node onto each of the original images I captured. However, the projected points are not aligning correctly with the images. Here is the script i use:

import os
import json
import numpy as np
import cv2
import matplotlib.pyplot as plt
import open3d as o3d
#Paths
sfm_file_path = '/media/harddrive/Meshroom-2023.3.0//MeshroomCache_second/StructureFromMotion/8720d431bab1584ae97a70208601c784e5013a97/cameras.sfm'  # Replace with your actual path
point_cloud_path = '/media/harddrive/Meshroom-2023.3.0/MeshroomCache_second/Texturing/355a5b90b3d75ed169f9400aa0c30852407d3d9e/texturedMesh.obj'  # Point cloud file path
images_dir = '/media/harddrive/new_dataset/new_photos'  # Directory containing your images
output_dir = '/media/harddrive/new_dataset/calib'  # Directory to save calibration files
os.makedirs(output_dir, exist_ok=True)  
#Load cameras.sfm
with open(sfm_file_path, 'r') as f:
    data = json.load(f)
#Load point cloud
mesh = o3d.io.read_triangle_mesh(point_cloud_path)
points = np.asarray(mesh.vertices)   # Shape: (N, 3)
#Extract intrinsics
intrinsic_data = data['intrinsics'][0]  # Assuming one set of intrinsics for all images
focal_length = float(intrinsic_data['focalLength'])  # In mm
principal_point = [
    float(intrinsic_data['principalPoint'][0]),  # In mm
    float(intrinsic_data['principalPoint'][1])   # In mm
]
width, height = float(intrinsic_data['width']), float(intrinsic_data['height'])  # In pixels
sensor_width = float(intrinsic_data['sensorWidth'])  # In mm
sensor_height = float(intrinsic_data['sensorHeight'])  # In mm
#Compute fx and fy in pixels
fx = (focal_length / sensor_width) * width
fy = (focal_length / sensor_height) * height
#Convert principal point offsets from mm to pixels
cx = principal_point[0] + width / 2
cy = principal_point[1] + height / 2
#Construct intrinsic matrix (K)
K = np.array([
    [fx, 0, cx],
    [0, fy, cy],
    [0, 0, 1]
])
#Build a mapping from pose IDs to image filenames
pose_to_image_map = {}
for view in data.get('views', []):
    pose_id = view.get('poseId') or view.get('value', {}).get('poseId')
    if pose_id is None:
        continue
    path = view.get('path') or view.get('value', {}).get('path')
    if path is None:
        continue
    image_file_name = os.path.basename(path)
    pose_to_image_map[pose_id] = image_file_name
#Iterate over each pose
for pose in data['poses']:
    pose_id = pose['poseId']
    # Get corresponding image filename
    image_filename = pose_to_image_map.get(pose_id)
    image_path = os.path.join(images_dir, image_filename)
    image = cv2.imread(image_path)
    #Extract rotation matrix and camera center from pose
    rotation_values = [float(x) for x in pose['pose']['transform']['rotation']]
    R_c2w = np.array(rotation_values).reshape(3, 3)  # Rotation from camera to world
    C = np.array([float(x) for x in pose['pose']['transform']['center']]).reshape(3, 1)  # Camera center in world coordinates
    #Compute rotation from world to camera coordinates
    R_w2c = R_c2w.T
    #Compute translation vector t = -R_w2c * C
    t = -np.dot(R_w2c, C).reshape(1, 3)
    extrinsic_matrix = np.hstack((R_w2c, t.T))  # Shape: (3, 4)
    # Compute projection matrix
    P = K @ extrinsic_matrix  # Shape: (3, 4)
    #Project points onto image plane
    points_homogeneous = np.hstack((points, np.ones((points.shape[0], 1))))  # Shape: (N, 4)
    projected_points = (P @ points_homogeneous.T).T  # Shape: (N, 3)
    #Normalize to get pixel coordinates
    projected_points[:, 0] /= projected_points[:, 2]
    projected_points[:, 1] /= projected_points[:, 2]
    #Extract pixel coordinates
    u = projected_points[:, 0]
    v = projected_points[:, 1]
    #Visualize projections
    plt.figure(figsize=(10, 8))
    plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    plt.scatter(u, v, s=0.5, c='red', alpha=0.5)
    plt.title(f'Projection on Image {pose_id}')
    plt.axis('off')
    plt.show()

Unfortunately, the results are incorrect. I’m unsure whether the issue lies with the extrinsic and intrinsic parameters or the point cloud from the mesh. I’ve tried various transformations on the point cloud, but the projected points remain inaccurate. This is one of the closest results I’ve managed to achieve:

Figure_1

Meshroom Version: 2023.3.0

Thank you in advance for your time!

fabiencastan commented 5 days ago

Maybe width and height gets inverted due to the vertical image. "Sensor width" is the physical width of the sensor, so in other word, the largest side of the sensor. If the camera is rotated, the image becomes vertical, but the max sensor side becomes the height instead of the width.

Janudis commented 4 days ago

Thank you for your reply! You are correct that the photos were taken vertically. To investigate further, I printed the following values for the sensor and image dimensions:

width, height = float(intrinsic_data['width']), float(intrinsic_data['height'])  # In pixels
sensor_width = float(intrinsic_data['sensorWidth'])  # In mm
sensor_height = float(intrinsic_data['sensorHeight'])  # In mm
print(f"width {width}")
print(f"height {height}")
print(f"sensor_width {sensor_width}")
print(f"sensor_height {sensor_height}")

The output was: width 4640.0 height 3472.0 sensor_width 7.524229526519775 sensor_height 5.630199432373047

If I understood correctly, the sensor width seems correct, but it appears that the image width and height might have been swapped? I also tried manually swapping the width and height:

height, width  = float(intrinsic_data['width']), float(intrinsic_data['height'])  # In pixels
sensor_width = float(intrinsic_data['sensorWidth'])  # In mm
sensor_height = float(intrinsic_data['sensorHeight'])  # In mm
print(f"width {width}")
print(f"height {height}")
print(f"sensor_width {sensor_width}")
print(f"sensor_height {sensor_height}")
# Compute fx and fy in pixels
fx = (focal_length / sensor_width) * width
fy = (focal_length / sensor_height) * height
# Convert principal point offsets from mm to pixels
cx = principal_point[0] + width / 2
cy = principal_point[1] + height / 2
distortion_params = intrinsic_data['distortionParams']
k1 = float(distortion_params[0])
k2 = float(distortion_params[1])
k3 = float(distortion_params[2])
dist_coeffs = np.array([k1, k2, 0, 0, k3])  # OpenCV uses 5 coefficients
# Construct intrinsic matrix (K)
K = np.array([
    [fx, 0, cx],
    [0, fy, cy],
    [0, 0, 1]
])

However, the projected points are still not aligned correctly. Could there be another aspect of the extrinsic or intrinsic parameters that might be causing this issue?

alicevision / Meshroom

[question] Projecting Meshroom 3D Mesh Points onto Images #2595