Getting incorrect results. Am I using the wrong kind of extrinsics matrices?

chrisoffner commented 6 months ago

I'm very intrigued by your paper and this method but when I try to run your code on my own images, I get incorrect results. I'm using synthetic images rendered from Blender. The scene and stereo camera rig looks as follows.

When I run example.py using my own images, intrinsics matrices A1, A2, and extrinsics matrices RT1, RT2, I get this incorrect result:

rectify1

When I switch the images, such that img1 becomes the image taken by the right camera and img2 becomes the image taken by the left camera (leaving the matrices as before!), I get this result where the images appear to have been "straightened out", yet they still don't line up w.r.t. their y-values.

rectify_flip

Moreover, this does not align with your example use, where img1 is the left image and img2 the right image: https://github.com/decadenza/DirectStereoRectification/blob/cdef75a7702705de5b96f7a84c88ae8073ccf989/example.py#L29-L30

I'm rather confused by this, and I'm wondering whether my extrinsics matrices are following a different convention than the one you intend. However, in your paper you state

Call $\mathbf{o}_1 = -\mathbf{R}_1^{-1} \cdot \mathbf{t}_1$ the position of the optical center in the WCS.

You also write

Let $\mathbf{A}_1 \in \mathbb{R}^{3 \times 3}$ be the intrinsic matrix of the left camera, with $\mathbf{R}_1 \in \mathbb{R}^{3 \times 3}$ and $\mathbf{t}_1 \in \mathbb{R}^{3 \times 3}$ its rotation matrix and translation vector, respectively, describing the position of the first Camera Coordinate System (CCS) with respect to World Coordinate System (WCS)

which, to me seems to characterise $\mathbf{t}1$ as the position of the first camera's optical center, expressed in world coordinates, which seems to be in conflict with "the position of the optical center in the WCS"_ actually being $\mathbf{o}_1 = -\mathbf{R}_1^{-1} \cdot \mathbf{t}_1.$ In my understanding, $\mathbf{t}_1$ describes the world origin expressed in the coordinate system of the left camera, i.e. camera 1. Am I mistaken there?

In any case, if I do the sanity check to compute

R1 = RT1[:3, :3]
t1 = RT1[:3, 3]

o1 = -R1.T @ t1

I do indeed get that o1 matches the world coordinates of the camera centre (bpy.data.objects.get("cam_L").matrix_world.translation). In case it's relevant, the extrinsics matrices have been extracted via

# Get cameras
cam_L = bpy.data.objects.get("cam_L")
cam_R = bpy.data.objects.get("cam_R")

# Obtain extrinsic matrix
RT1 = np.array(cam_L.matrix_world.inverted())
RT2 = np.array(cam_R.matrix_world.inverted())

What might I be doing wrong here? I'd appreciate any advice or guidance. Thank you! :)

decadenza commented 6 months ago

Hi and thank you for your post. I'll try to help as I can.

In my understanding, describes the world origin expressed in the coordinate system of the left camera, i.e. camera 1. Am I mistaken there?

The key here is considering in which orientation t_1 is considered. You are also right in your statement, as t_1 is a vector oriented as the camera 1, therefore to get o_1 you need to cancel its rotation t_1.

The statement you are reporting has to be considered in a general way. It is true that R_1 and t_1 can describe the position of the camera in WCS, but they need to be used according the the formula above. That's just a word pun.

When I run example.py using my own images, intrinsics matrices A1, A2, and extrinsics matrices RT1, RT2, I get this incorrect result

I would say that it is caused by wrong intrinsic matrices. Please check https://blender.stackexchange.com/a/120063/3581 for how to extract intrinsic matrices from Blender.

I am reporting the full code here (credits https://blender.stackexchange.com/users/3581/daniel):

#!/usr/bin/env python3
import bpy
from mathutils import Matrix, Vector

#---------------------------------------------------------------
# 3x4 P matrix from Blender camera
#---------------------------------------------------------------

# BKE_camera_sensor_size
def get_sensor_size(sensor_fit, sensor_x, sensor_y):
    if sensor_fit == 'VERTICAL':
        return sensor_y
    return sensor_x

# BKE_camera_sensor_fit
def get_sensor_fit(sensor_fit, size_x, size_y):
    if sensor_fit == 'AUTO':
        if size_x >= size_y:
            return 'HORIZONTAL'
        else:
            return 'VERTICAL'
    return sensor_fit

# Build intrinsic camera parameters from Blender camera data
#
# See notes on this in 
# blender.stackexchange.com/questions/15102/what-is-blenders-camera-projection-matrix-model
# as well as
# https://blender.stackexchange.com/a/120063/3581
def get_calibration_matrix_K_from_blender(camd):
    if camd.type != 'PERSP':
        raise ValueError('Non-perspective cameras not supported')
    scene = bpy.context.scene
    f_in_mm = camd.lens
    scale = scene.render.resolution_percentage / 100
    resolution_x_in_px = scale * scene.render.resolution_x
    resolution_y_in_px = scale * scene.render.resolution_y
    sensor_size_in_mm = get_sensor_size(camd.sensor_fit, camd.sensor_width, camd.sensor_height)
    sensor_fit = get_sensor_fit(
        camd.sensor_fit,
        scene.render.pixel_aspect_x * resolution_x_in_px,
        scene.render.pixel_aspect_y * resolution_y_in_px
    )
    pixel_aspect_ratio = scene.render.pixel_aspect_y / scene.render.pixel_aspect_x
    if sensor_fit == 'HORIZONTAL':
        view_fac_in_px = resolution_x_in_px
    else:
        view_fac_in_px = pixel_aspect_ratio * resolution_y_in_px
    pixel_size_mm_per_px = sensor_size_in_mm / f_in_mm / view_fac_in_px
    s_u = 1 / pixel_size_mm_per_px
    s_v = 1 / pixel_size_mm_per_px / pixel_aspect_ratio

    # Parameters of intrinsic calibration matrix K
    u_0 = resolution_x_in_px / 2 - camd.shift_x * view_fac_in_px
    v_0 = resolution_y_in_px / 2 + camd.shift_y * view_fac_in_px / pixel_aspect_ratio
    skew = 0 # only use rectangular pixels

    K = Matrix(
        ((s_u, skew, u_0),
        (   0,  s_v, v_0),
        (   0,    0,   1)))
    return K

# Returns camera rotation and translation matrices from Blender.
# 
# There are 3 coordinate systems involved:
#    1. The World coordinates: "world"
#       - right-handed
#    2. The Blender camera coordinates: "bcam"
#       - x is horizontal
#       - y is up
#       - right-handed: negative z look-at direction
#    3. The desired computer vision camera coordinates: "cv"
#       - x is horizontal
#       - y is down (to align to the actual pixel coordinates 
#         used in digital images)
#       - right-handed: positive z look-at direction
def get_3x4_RT_matrix_from_blender(cam):
    # bcam stands for blender camera
    R_bcam2cv = Matrix(
        ((1, 0,  0),
        (0, -1, 0),
        (0, 0, -1)))

    # Transpose since the rotation is object rotation, 
    # and we want coordinate rotation
    # R_world2bcam = cam.rotation_euler.to_matrix().transposed()
    # T_world2bcam = -1*R_world2bcam * location
    #
    # Use matrix_world instead to account for all constraints
    location, rotation = cam.matrix_world.decompose()[0:2]
    R_world2bcam = rotation.to_matrix().transposed()

    # Convert camera location to translation vector used in coordinate changes
    # T_world2bcam = -1*R_world2bcam*cam.location
    # Use location from matrix_world to account for constraints:     
    T_world2bcam = -1*R_world2bcam @ location

    # Build the coordinate transform matrix from world to computer vision camera
    R_world2cv = R_bcam2cv @ R_world2bcam
    T_world2cv = R_bcam2cv @ T_world2bcam

    # put into 3x4 matrix
    RT = Matrix((
        R_world2cv[0][:] + (T_world2cv[0],),
        R_world2cv[1][:] + (T_world2cv[1],),
        R_world2cv[2][:] + (T_world2cv[2],)
        ))
    return RT

def get_3x4_P_matrix_from_blender(cam):
    K = get_calibration_matrix_K_from_blender(cam.data)
    RT = get_3x4_RT_matrix_from_blender(cam)
    return K @ RT, K, RT

# ----------------------------------------------------------
if __name__ == "__main__":
    import numpy as np
    # Insert your camera name here
    print("Camera")
    cam = bpy.data.objects['Camera']
    P, K, RT = get_3x4_P_matrix_from_blender(cam)
    print("K")
    #print(K)
    print(repr(np.array(K)))
    print("RT")
    #print(RT)
    print(repr(np.array(RT)))
    print("P")
    #print(P)
    print(repr(np.array(P)))

    print("Projector")
    cam = bpy.data.objects['Projector']
    P, K, RT = get_3x4_P_matrix_from_blender(cam)
    print("K")
    #print(K)
    print(repr(np.array(K)))
    print("RT")
    #print(RT)
    print(repr(np.array(RT)))
    print("P")
    #print(P)
    print(repr(np.array(P)))

    print("==== 3D Cursor projection ====")
    pc = P @ bpy.context.scene.cursor.location
    pc /= pc[2]
    print("Projected cursor location")
    print(pc)

    # Bonus code: save the 3x4 P matrix into a plain text file
    # Don't forget to import numpy for this
    #nP = numpy.matrix(P)
    #numpy.savetxt("/tmp/P3x4.txt", nP)  # to select precision, use e.g. fmt='%.2f'

chrisoffner commented 6 months ago

I would say that it is caused by wrong intrinsic matrices. Please check https://blender.stackexchange.com/a/120063/3581 for how to extract intrinsic matrices from Blender.

Thank you for your elaborate response. I just compared the intrinsics matrices produced by get_calibration_matrix_K_from_blender with the intrinsics matrices I obtained using

# Construct intrinsics matrix
f_x = cam.data.lens
sensor_width_mm = cam_data.sensor_width
resolution_x_px = bpy.context.scene.render.resolution_x
scale = bpy.context.scene.render.resolution_percentage / 100.0
aspect_ratio = bpy.context.scene.render.pixel_aspect_x / bpy.context.scene.render.pixel_aspect_y
f_x_px = f_x / sensor_width_mm * resolution_x_px * scale
f_y_px = f_x_px * aspect_ratio
c_x = resolution_x_px * scale / 2
c_y = bpy.context.scene.render.resolution_y * scale / 2

cam_intrinsic_matrix = np.array([[f_x_px,      0, c_x],
                                 [0,      f_y_px, c_y],
                                 [0,           0,   1]])

and they appear to match. So it doesn't seem that the intrinsics matrices are incorrect, unless I'm really missing something dumb here. 😅

chrisoffner commented 6 months ago

Oh wow, now I see however, that the RT produced by the code you posted creates the matrix

RT
[[ 1.      0.      0.      0.6384]
 [ 0.     -0.0175 -0.9998  0.8308]
 [ 0.      0.9998 -0.0175 38.3533]]

whereas mine is:

cam_extrinsic_matrix_L:
[[  1.      -0.      -0.       0.6384]
 [ -0.       0.0175   0.9998  -0.8308]
 [  0.      -0.9998   0.0175 -38.3533]]

That is, my extrinsics matrix cam_extrinsic_matrix_L is the negation of yours (provided you use the RT from the code you posed) except for $t_x,$ where the signs match.

chrisoffner commented 6 months ago

Yes, with the RT matrices from the script you linked to, I now get the correct results. :) Thank you so much, I really appreciate your help!

rectified_mindistort

decadenza / DirectStereoRectification

Getting incorrect results. Am I using the wrong kind of extrinsics matrices? #10