colmap / colmap

COLMAP - Structure-from-Motion and Multi-View Stereo
https://colmap.github.io/
Other
7.81k stars 1.53k forks source link

multiple depth maps to one (world) coordinate system #675

Closed raphaelsulzer closed 5 years ago

raphaelsulzer commented 5 years ago

I would like to transform the (geometric) depth maps of a COLMAP project into one common coordinate system (I do not want to do a depth map fusion). For this I wrote the python script below, based on the python functions provided by COLMAP.

` cameras, images, points3D = read_model( "pathToModel/txt/", ".txt")

depth_map = read_array(
    "/0000.png.geometric.bin")

im1 = images[1]
cam1 = cameras[1]

rotmat = qvec2rotmat(im1.qvec)
tvec = im1.tvec

focal = cam1.params[0]

rows_ori = cam1.height
cols_ori = cam1.width

rows = depth_map.shape[0]
cols = depth_map.shape[1]

r_scale_ori_to_depth_image = rows_ori/rows
c_scale_ori_to_depth_image = cols_ori/cols

dpix = 0.0072*10**-3

xyz = np.empty((rows*cols, 3))
for r in range(rows):
    for c in range(cols):
        x = (c-cols/2)*c_scale_ori_to_depth_image*dpix
        y = (r-rows/2)*r_scale_ori_to_depth_image*dpix
        z = (depth_map[r][c]+focal)*dpix
        imcoord = np.asarray([x, y, z])
        pt = np.matmul(-np.transpose(rotmat), tvec+imcoord)
        xyz[r*c] = pt

# export to PLY
with open("/depth_map.ply", "w") as fid:
    fid.write("ply\n")
    fid.write("format ascii 1.0\n")
    fid.write("element vertex %d\n" % xyz.shape[0])
    fid.write("property float x\n")
    fid.write("property float y\n")
    fid.write("property float z\n")
    fid.write("end_header\n")
    for i in range(xyz.shape[0]):
        if i % 1000 == 0:
            print("Writing point", i, "/", xyz.shape[0])
        fid.write("%f %f %f \n" % (xyz[i, 0], xyz[i, 1], xyz[i, 2]))`

My main problem is that I am not clear about the interior and exterior camera orientation provided by COLMAP. As far as I understand, the quaternion and translation vector from the images.txt file gives me the exterior orientation of each image. So with this I can transform from world to camera coordinate system and vice versa. Now in the code above I am "measuring" pixels in the depth map in image space. So to first translate from image to camera space, I am moving the origin to the center of the image and multiplying by the pixel size. However, my final results are nonsense.

Is it correct what I am doing? What are the units of the depth in the depth maps? Where can I find the pixel size that COLMAP uses, to go from pixel to camera coordinates? How do I deal with the different size of depth maps and original images?

tsattler commented 5 years ago

@raphaelsulzer The provided extrinsics define the mapping from world to camera coordinates (following the standard Computer Vision convention).

The depth maps don't have actual units. They are defined by the scale of a reconstruction.

The best description for converting depth map entries to 3D points can probably be found here: https://github.com/colmap/colmap/blob/d3a29e203ab69e91eda938d6e56e1c7339d62a99/src/mvs/fusion.cc#L216

raphaelsulzer commented 5 years ago

@tsattler Thank you! It was fairly easy like that.

zikuicai commented 5 years ago

@tsattler Thanks for your answer. Does the convention you mentioned here refer to the right-hand rule where the axes of the world coordinate system follow [X,Y,Z] -> [right,up,backwards]?

tsattler commented 5 years ago

"The local camera coordinate system of an image is defined in a way that the X axis points to the right, the Y axis to the bottom, and the Z axis to the front as seen from the image." (see https://colmap.github.io/format.html#images-txt)

zikuicai commented 5 years ago

@tsattler Thanks. I had read the documentation but the camera extrinsics part was still not very clear to me. Does the camera coordinate look like the following figure? The gaze direction of the camera is the negative z axis.

Does the world coordinate system also follow ''the X axis points to the right, the Y axis to the bottom, and the Z axis to the front ''?

tsattler commented 5 years ago

The coordinate system is the one commonly used in the computer vision literature, where the camera is looking down the z-axis.

ZhixiongSun commented 4 years ago

@raphaelsulzer Hi, I alse met this problem, have you fixed this problem? I used function read_and_write_dense.py to get depth map and use interior parameters to convert depth map to camera coordinate. But seems someting wrong. Could you please tell me how you use the depth map.bin Thanks a lot

Screenshot_4

raphaelsulzer commented 4 years ago

I used some c++ code in the end to do what I wanted to do. After linking the colmap library to your own c++ code it is fairly easy to load models, including depth maps etc. This code could be a good entry point: https://github.com/colmap/colmap/issues/820#issue-575611194