Xuqian Ren , Wenjia Wang , Dingding Cai , Tuuli Tuominen, Juho Kannala, Esa Rahtu
Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism. However, there exists a knowledge gap in how to apply geometric reconstruction and photorealism modeling (novel view synthesis) in a unified framework. To address this gap and promote the development of robust and immersive modeling and rendering with consumer-grade devices, we propose a real-world Multi-Sensor Hybrid Room Dataset (MuSHRoom). Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices, and can jointly learn 3D reconstruction and novel view synthesis instead of treating them as separate tasks, making them ideal for real-world applications. We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis. Our dataset and benchmark show great potential in promoting the improvements for fusing 3D reconstruction and high-quality rendering in a robust and computationally efficient end-to-end fashion. The dataset and code are available at the project website: https://xuqianren.github.io/publications/MuSHRoom/.
If you use this data, please cite the original paper presenting it:
@misc{ren2023mushroom,
title={MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis},
author={Xuqian Ren and Wenjia Wang and Dingding Cai and Tuuli Tuominen and Juho Kannala and Esa Rahtu},
year={2023},
eprint={2311.02778},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
The data files are available for download on Zenodo and can be downloaded on a per dataset basis from there.
β¨β¨To use this dataset with nerfstudio framework, please follow the instruction in DN-Splatter.
To maximize compatibility, all data is published in open and simple file formats. The folder structure for one data set looks like the following:
<room_name>
| ββ kinect
| ββ long_capture
β images/ # extracted rgb images of keyframe
β depth/ # extracted depth images of keyframe
β intrinsic/ # intrinsic parameters
β PointCloud/ # spectacularAI point cloud of keyframe
β pose/ # spectacularAI pose of keyframe. These poses are aligned with the metric of depth. Poses are in the OPENCV coordination.
β calibration.json; data.jsonl; data.mkv; data2.mkv; vio_config.yaml # raw videos and parameters from spectacularAI SDK
β camera_parameters.txt # camera settings during capture
β test.txt # image id for testing within a single sequence
β transformations_colmap.json # global optimized colmap used for testing with a different sequence
β transformations.json # spectacularAI pose saved in the json file. Poses are in the OPENGL coordination.
| ββ short_capture
β images/ # same with long capture
β depth/ # same with long capture
β PointCloud/ # same with long capture
β pose/ # same with long capture
β intrinsic/ # same with long capture
β calibration.json; data.jsonl; data.mkv; data2.mkv; vio_config.yaml # raw videos and parameters from
β transformations_colmap.json # same with long capture
β transformations.json # same with long capture
| ββ iphone
| ββ long_capture
β images/ # same with Kinect
β depth/ # same with Kinect
β polycam_mesh/ # mesh provided by polycam, not aligned with the pose, just for visulization.
β polycam_pointcloud.ply # point cloud provided by polycam, just for visulization.
β sdf_dataset_all_interp_4 # same with Kinect
β sdf_dataset_train_interp_4 # same with Kinect
β test.txt # same with Kinect
β transformations_colmap.json # same with Kinect
β transformations.json # polycam pose
| ββ short_capture
β images/ # same with Kinect
β depth/ # same with Kinect
β transformations_colmap.json # same with long capture
β transformations.json # same with long capture
ββ gt_mesh.ply # reference mesh used for geometry comparison
ββ gt_pd.ply # reference point cloud used for geometry comparison
ββ icp_iphone.json # aligned transformation matrix used for iPhone sequences
ββ icp_kinect.json # aligned transformation matrix used for kinect sequences
Scene | Scale (m) | Exposure time (Β΅s) | White Balance (K) | Brightness | Gain |
---|---|---|---|---|---|
coffee room | 6.3 $\times$ 5 $\times$ 3.1 | 41700 | 2830 | 128 | 130 |
computer room | 9.6 $\times$ 6.1 $\times$ 2.5 | 33330 | 3100 | 128 | 255 |
classroom | 8.9 $\times$ 7.2 $\times$ 2.8 | 33330 | 3300 | 128 | 88 |
honka | 6.1 $\times$ 3.9 $\times$ 2.3 | 16670 | 3200 | 128 | 128 |
koivu | 10 $\times$ 8 $\times$ 2.5 | 16670 | 4200 | 128 | 128 |
vr room | 5.1 $\times$ 4.4 $\times$ 2.8 | 8300 | 3300 | 128 | 88 |
kokko | 6.7 $\times$ 6.0 $\times$ 2.5 | 133330 | 3300 | Auto | Auto |
sauna | 9.9 $\times$ 6.5 $\times$ 2.4 | Auto | 3300 | Auto | Auto |
activity | 12 $\times$ 9 $\times$ 2.5 | 50000 | 3200 | 128 | 130 |
olohuone | 19 $\times$ 6.4 $\times$ 3 | Auto | 3600 | Auto | Auto |
We update Nerfacto/Depth-Nerfacto/Neusfacto/Splatfacto trained only with COLMAP pose there. We trained once time to evaluate both the two evaluation protocols there to improve efficiency, instead of training two times which was used in the previous paper before. The test ID used for evaluating the "test within a single sequence" is stored in "test.txt" in each "long_capture" folder, the remaining ID in the long sequence is used for training. We use the same model to evaluate the images in the short sequence. Mesh extracted from this model is used for evaluating the mesh reconstruction ability. Please follow this training and comparsion method reported here for efficiency.
To use the MuSHRoom dataset with nerfstudio framework, please use the dataparser here: dn_splatter/data/mushroom_dataparser.py. The instructions are in https://github.com/maturk/dn-splatter#mushroom.
Device | Methods | Reconstruction quality | Rendering quality | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Test within a single sequence | Test with a different sequence | |||||||||||
Acc β | Comp β | C-l1 β | NC β | F-score β | PSNR β | SSIM β | LPIPS β | PSNR β | SSIM β | LPIPS β | ||
iPhone | Nerfacto | 0.0652 | 0.0603 | 0.0628 | 0.7491 | 0.6390 | 20.83 | 0.7653 | 0.2506 | 20.36 | 0.7448 | 0.2781 |
Depth-Nerfacto | 0.0653 | 0.0614 | 0.0634 | 0.7354 | 0.6126 | 21.23 | 0.7623 | 0.2612 | 20.67 | 0.7423 | 0.2873 | |
MonoSDF | 0.0792 | 0.0237 | 0.0514 | 0.8200 | 0.7596 | 19.79 | 0.6972 | 0.4122 | 17.92 | 0.6683 | 0.4384 | |
Splatfacto | 0.1074 | 0.0708 | 0.0881 | 0.7602 | 0.4405 | 24.22 | 0.8375 | 0.1421 | 21.39 | 0.7738 | 0.1986 | |
Kinect | Nerfacto | 0.0669 | 0.0695 | 0.0682 | 0.7458 | 0.6252 | 23.89 | 0.8375 | 0.2048 | 22.43 | 0.8331 | 0.2010 |
Depth-Nerfacto | 0.0710 | 0.0691 | 0.0701 | 0.7274 | 0.5905 | 24.21 | 0.8370 | 0.2107 | 22.77 | 0.8345 | 0.2036 | |
MonoSDF | 0.0439 | 0.0204 | 0.0321 | 0.8616 | 0.8753 | 23.05 | 0.8315 | 0.2434 | 21.60 | 0.8267 | 0.2219 | |
Splatfacto | 0.1007 | 0.0704 | 0.0855 | 0.7689 | 0.4697 | 26.07 | 0.8844 | 0.1378 | 23.28 | 0.8604 | 0.1579 |