Closed 0010SS closed 1 week ago
The scannet poses are camera to world. Can you check whether the poses are correct?
As you said, I've transformed the camera-to-world matrix to the extrinsic matrix by taking the inverse of it using
cam_pose = np.linalg.inv(cam_to_world)
. However, it turns out to give something like the picture above.
Are there any possibilities that I can get the code of how you manage to visualize the output images into mesh using tsdf-fusion-python
? That would be a great help! Thank you so much!
Can you successfully do tsdf fusion on the orginal scannet data? You can pose the codes here for us to analyze.
Thanks for your reply! I cannot do TSDF fusion successfully on the original ScanNet data as well, the room creates weird shapes as above. I basically fed the data into tsdf-fusion-python
, but customizing the input image and adding an inverse.
cam_intr = np.loadtxt("mvd_data/scene0009_00_0/K.txt", delimiter=' ')
vol_bnds = np.zeros((3,2))
for i in range(n_imgs):
# Read depth image and camera pose
depth_im = cv2.imread("mvd_data/scene0009_00_0/%d_depth.png"%(i*20),-1).astype(float)
depth_im /= 1000. # depth is saved in 16-bit PNG in millimeters
# depth_im[depth_im == 65.535] = 0 # set invalid depth to 0 (specific to 7-scenes dataset)
cam_to_world = np.loadtxt("mvd_data/scene0009_00_0/%d_poses.txt"%(i*20)) # 4x4 rigid transformation matrix
cam_pose = np.linalg.inv(cam_to_world)
# Compute camera view frustum and extend convex hull
view_frust_pts = fusion.get_view_frustum(depth_im, cam_intr, cam_pose)
vol_bnds[:,0] = np.minimum(vol_bnds[:,0], np.amin(view_frust_pts, axis=1))
vol_bnds[:,1] = np.maximum(vol_bnds[:,1], np.amax(view_frust_pts, axis=1))
# Loop through RGB-D images and fuse them together
t0_elapse = time.time()
for i in range(n_imgs):
print("Fusing frame %d/%d"%(i+1, n_imgs))
# Read RGB-D image and camera pose
color_image = cv2.cvtColor(cv2.imread("mvd_data/scene0009_00_0/%d_gt.png"%(i*20)), cv2.COLOR_BGR2RGB)
depth_im = cv2.imread("mvd_data/scene0009_00_0/%d_depth.png"%(i*20),-1).astype(float)
depth_im /= 1000.
# depth_im[depth_im == 65.535] = 0
cam_to_world = np.loadtxt("mvd_data/scene0009_00_0/%d_poses.txt"%(i*20)) # 4x4 rigid transformation matrix
cam_pose = np.linalg.inv(cam_to_world)
# Integrate observation into voxel volume (assume color aligned with depth)
tsdf_vol.integrate(color_image, depth_im, cam_intr, cam_pose, obs_weight=1.)
fps = n_imgs / (time.time() - t0_elapse)
print("Average FPS: {:.2f}".format(fps))
# Get mesh from voxel volume and save to disk (can be viewed with Meshlab)
print("Saving mesh to mesh.ply...")
verts, faces, norms, colors = tsdf_vol.get_mesh()
fusion.meshwrite("mesh.ply", verts, faces, norms, colors)
The data is basically the output from your MVDiffusion. Thanks!
Hi @0010SS, did you manage to reconstruct the mesh?
Hi @dengchcs, Yes, I have reconstructed the mesh successfully. The code works fine, but you need to tweak the intrinsic matrix bit by bit to make the meshes match.
Hi @dengchcs, Yes, I have reconstructed the mesh successfully. The code works fine, but you need to tweak the intrinsic matrix bit by bit to make the meshes match.
Thanks! I can reconstruct the geometry now (though the color reconstruction is still buggy...). My problem was that my camera's coordinate system seemed to differ from that of ScanNet.
Thanks, Mr. Tang for your awesome work! I have been generating a set of images using
depth_fix_interval
mode based on Scannet. However, when I feed the output intotsdf
, including the poses, K, depth, and preds in the output log's file, it generates a weird mesh that does not seem to align, how can I solve this issue? Below is an example image: