TUTvision / MuSHRoom

Indoor room dataset used for novel view synthesis and 3d reconstruction (WACV 2024)
34 stars 1 forks source link

MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis

Xuqian Ren , Wenjia Wang , Dingding Cai , Tuuli Tuominen, Juho Kannala, Esa Rahtu

Project Page | Paper

Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism. However, there exists a knowledge gap in how to apply geometric reconstruction and photorealism modeling (novel view synthesis) in a unified framework. To address this gap and promote the development of robust and immersive modeling and rendering with consumer-grade devices, we propose a real-world Multi-Sensor Hybrid Room Dataset (MuSHRoom). Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices, and can jointly learn 3D reconstruction and novel view synthesis instead of treating them as separate tasks, making them ideal for real-world applications. We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis. Our dataset and benchmark show great potential in promoting the improvements for fusing 3D reconstruction and high-quality rendering in a robust and computationally efficient end-to-end fashion. The dataset and code are available at the project website: https://xuqianren.github.io/publications/MuSHRoom/.

Updates

Attribution

If you use this data, please cite the original paper presenting it:

@misc{ren2023mushroom,
      title={MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis}, 
      author={Xuqian Ren and Wenjia Wang and Dingding Cai and Tuuli Tuominen and Juho Kannala and Esa Rahtu},
      year={2023},
      eprint={2311.02778},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Downloading the data

The data files are available for download on Zenodo and can be downloaded on a per dataset basis from there.

✨✨To use this dataset with nerfstudio framework, please follow the instruction in DN-Splatter.

Data structure

To maximize compatibility, all data is published in open and simple file formats. The folder structure for one data set looks like the following:

<room_name>
| β€”β€” kinect
    | β€”β€” long_capture
        β€” images/ # extracted rgb images of keyframe
        β€” depth/ # extracted depth images of keyframe
        β€” intrinsic/ # intrinsic parameters
        β€” PointCloud/ # spectacularAI point cloud of keyframe
        β€” pose/ # spectacularAI pose of keyframe. These poses are aligned with the metric of depth. Poses are in the OPENCV coordination.
        β€” calibration.json; data.jsonl; data.mkv; data2.mkv; vio_config.yaml    # raw videos and parameters from spectacularAI SDK
        β€” camera_parameters.txt # camera settings during capture
        β€” test.txt # image id for testing within a single sequence
        β€” transformations_colmap.json # global optimized colmap used for testing with a different sequence
        β€” transformations.json  # spectacularAI pose saved in the json file. Poses are in the OPENGL coordination.
    | β€”β€” short_capture
        β€” images/ # same with long capture
        β€” depth/    # same with long capture
        β€” PointCloud/   # same with long capture
        β€” pose/ # same with long capture
        β€” intrinsic/ # same with long capture
        β€” calibration.json; data.jsonl; data.mkv; data2.mkv; vio_config.yaml    # raw videos and parameters from 
        β€” transformations_colmap.json # same with long capture
        β€” transformations.json  # same with long capture
| β€”β€” iphone
    | β€”β€” long_capture
        β€” images/   # same with Kinect
        β€” depth/    # same with Kinect
        β€” polycam_mesh/     # mesh provided by polycam, not aligned with the pose, just for visulization.
        β€” polycam_pointcloud.ply    # point cloud provided by polycam, just for visulization.
        β€” sdf_dataset_all_interp_4  # same with Kinect
        β€” sdf_dataset_train_interp_4    # same with Kinect
        β€” test.txt  # same with Kinect
        β€” transformations_colmap.json   # same with Kinect
        β€” transformations.json  # polycam pose
    | β€”β€” short_capture
        β€” images/   # same with Kinect
        β€” depth/    # same with Kinect
        β€” transformations_colmap.json   # same with long capture
        β€” transformations.json  # same with long capture
β€”β€” gt_mesh.ply  # reference mesh used for geometry comparison
β€”β€” gt_pd.ply    # reference point cloud used for geometry comparison
β€”β€” icp_iphone.json  # aligned transformation matrix used for iPhone sequences
β€”β€” icp_kinect.json  # aligned transformation matrix used for kinect sequences

List of data sets

Scene Scale (m) Exposure time (Β΅s) White Balance (K) Brightness Gain
coffee room 6.3 $\times$ 5 $\times$ 3.1 41700 2830 128 130
computer room 9.6 $\times$ 6.1 $\times$ 2.5 33330 3100 128 255
classroom 8.9 $\times$ 7.2 $\times$ 2.8 33330 3300 128 88
honka 6.1 $\times$ 3.9 $\times$ 2.3 16670 3200 128 128
koivu 10 $\times$ 8 $\times$ 2.5 16670 4200 128 128
vr room 5.1 $\times$ 4.4 $\times$ 2.8 8300 3300 128 88
kokko 6.7 $\times$ 6.0 $\times$ 2.5 133330 3300 Auto Auto
sauna 9.9 $\times$ 6.5 $\times$ 2.4 Auto 3300 Auto Auto
activity 12 $\times$ 9 $\times$ 2.5 50000 3200 128 130
olohuone 19 $\times$ 6.4 $\times$ 3 Auto 3600 Auto Auto

Novel view synthesis and mesh reconstruction results

We update Nerfacto/Depth-Nerfacto/Neusfacto/Splatfacto trained only with COLMAP pose there. We trained once time to evaluate both the two evaluation protocols there to improve efficiency, instead of training two times which was used in the previous paper before. The test ID used for evaluating the "test within a single sequence" is stored in "test.txt" in each "long_capture" folder, the remaining ID in the long sequence is used for training. We use the same model to evaluate the images in the short sequence. Mesh extracted from this model is used for evaluating the mesh reconstruction ability. Please follow this training and comparsion method reported here for efficiency.

To use the MuSHRoom dataset with nerfstudio framework, please use the dataparser here: dn_splatter/data/mushroom_dataparser.py. The instructions are in https://github.com/maturk/dn-splatter#mushroom.

Device Methods Reconstruction quality Rendering quality
Test within a single sequence Test with a different sequence
Acc ↓ Comp ↓ C-l1 ↓ NC ↑ F-score ↑ PSNR ↑ SSIM ↑ LPIPS ↓ PSNR ↑ SSIM ↑ LPIPS ↓
iPhone Nerfacto 0.0652 0.0603 0.0628 0.7491 0.6390 20.83 0.7653 0.2506 20.36 0.7448 0.2781
Depth-Nerfacto 0.0653 0.0614 0.0634 0.7354 0.6126 21.23 0.7623 0.2612 20.67 0.7423 0.2873
MonoSDF 0.0792 0.0237 0.0514 0.8200 0.7596 19.79 0.6972 0.4122 17.92 0.6683 0.4384
Splatfacto 0.1074 0.0708 0.0881 0.7602 0.4405 24.22 0.8375 0.1421 21.39 0.7738 0.1986
Kinect Nerfacto 0.0669 0.0695 0.0682 0.7458 0.6252 23.89 0.8375 0.2048 22.43 0.8331 0.2010
Depth-Nerfacto 0.0710 0.0691 0.0701 0.7274 0.5905 24.21 0.8370 0.2107 22.77 0.8345 0.2036
MonoSDF 0.0439 0.0204 0.0321 0.8616 0.8753 23.05 0.8315 0.2434 21.60 0.8267 0.2219
Splatfacto 0.1007 0.0704 0.0855 0.7689 0.4697 26.07 0.8844 0.1378 23.28 0.8604 0.1579