MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis

Xuqian Ren , Wenjia Wang , Dingding Cai , Tuuli Tuominen, Juho Kannala, Esa Rahtu

Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism. However, there exists a knowledge gap in how to apply geometric reconstruction and photorealism modeling (novel view synthesis) in a unified framework. To address this gap and promote the development of robust and immersive modeling and rendering with consumer-grade devices, we propose a real-world Multi-Sensor Hybrid Room Dataset (MuSHRoom). Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices, and can jointly learn 3D reconstruction and novel view synthesis instead of treating them as separate tasks, making them ideal for real-world applications. We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis. Our dataset and benchmark show great potential in promoting the improvements for fusing 3D reconstruction and high-quality rendering in a robust and computationally efficient end-to-end fashion. The dataset and code are available at the project website: https://xuqianren.github.io/publications/MuSHRoom/.

Updates

[x] 📣 For easy use, we update MuSHRoom iPhone sequence with COLMAP format pose and point cloud in zenodo [2024-03-14]
[x] 📣 Updates results with only COLMAP pose [2024-03-14]
[x] 📣 Dataset process scripts have been released [2023-11-19]
[x] 📣 Release Kinect and iPhone Dataset. [2023-11-28]
[x] 📣 Release mesh evaluation script [2023-11-26]
[x] 📣 Release our method. [2023-11-26]
[x] 📣 Release mesh. [2023-11-29]

Attribution

If you use this data, please cite the original paper presenting it:

@misc{ren2023mushroom,
      title={MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis}, 
      author={Xuqian Ren and Wenjia Wang and Dingding Cai and Tuuli Tuominen and Juho Kannala and Esa Rahtu},
      year={2023},
      eprint={2311.02778},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Downloading the data

The data files are available for download on Zenodo and can be downloaded on a per dataset basis from there.

✨✨To use this dataset with nerfstudio framework, please follow the instruction in DN-Splatter.

Data structure

To maximize compatibility, all data is published in open and simple file formats. The folder structure for one data set looks like the following:

<room_name>
| —— kinect
    | —— long_capture
        — images/ # extracted rgb images of keyframe
        — depth/ # extracted depth images of keyframe
        — intrinsic/ # intrinsic parameters
        — PointCloud/ # spectacularAI point cloud of keyframe
        — pose/ # spectacularAI pose of keyframe. These poses are aligned with the metric of depth. Poses are in the OPENCV coordination.
        — calibration.json; data.jsonl; data.mkv; data2.mkv; vio_config.yaml    # raw videos and parameters from spectacularAI SDK
        — camera_parameters.txt # camera settings during capture
        — test.txt # image id for testing within a single sequence
        — transformations_colmap.json # global optimized colmap used for testing with a different sequence
        — transformations.json  # spectacularAI pose saved in the json file. Poses are in the OPENGL coordination.
    | —— short_capture
        — images/ # same with long capture
        — depth/    # same with long capture
        — PointCloud/   # same with long capture
        — pose/ # same with long capture
        — intrinsic/ # same with long capture
        — calibration.json; data.jsonl; data.mkv; data2.mkv; vio_config.yaml    # raw videos and parameters from 
        — transformations_colmap.json # same with long capture
        — transformations.json  # same with long capture
| —— iphone
    | —— long_capture
        — images/   # same with Kinect
        — depth/    # same with Kinect
        — polycam_mesh/     # mesh provided by polycam, not aligned with the pose, just for visulization.
        — polycam_pointcloud.ply    # point cloud provided by polycam, just for visulization.
        — sdf_dataset_all_interp_4  # same with Kinect
        — sdf_dataset_train_interp_4    # same with Kinect
        — test.txt  # same with Kinect
        — transformations_colmap.json   # same with Kinect
        — transformations.json  # polycam pose
    | —— short_capture
        — images/   # same with Kinect
        — depth/    # same with Kinect
        — transformations_colmap.json   # same with long capture
        — transformations.json  # same with long capture
—— gt_mesh.ply  # reference mesh used for geometry comparison
—— gt_pd.ply    # reference point cloud used for geometry comparison
—— icp_iphone.json  # aligned transformation matrix used for iPhone sequences
—— icp_kinect.json  # aligned transformation matrix used for kinect sequences

List of data sets

Scene	Scale (m)	Exposure time (µs)	White Balance (K)	Brightness	Gain
coffee room	6.3 $\times$ 5 $\times$ 3.1	41700	2830	128	130
computer room	9.6 $\times$ 6.1 $\times$ 2.5	33330	3100	128	255
classroom	8.9 $\times$ 7.2 $\times$ 2.8	33330	3300	128	88
honka	6.1 $\times$ 3.9 $\times$ 2.3	16670	3200	128	128
koivu	10 $\times$ 8 $\times$ 2.5	16670	4200	128	128
vr room	5.1 $\times$ 4.4 $\times$ 2.8	8300	3300	128	88
kokko	6.7 $\times$ 6.0 $\times$ 2.5	133330	3300	Auto	Auto
sauna	9.9 $\times$ 6.5 $\times$ 2.4	Auto	3300	Auto	Auto
activity	12 $\times$ 9 $\times$ 2.5	50000	3200	128	130
olohuone	19 $\times$ 6.4 $\times$ 3	Auto	3600	Auto	Auto

Novel view synthesis and mesh reconstruction results

We update Nerfacto/Depth-Nerfacto/Neusfacto/Splatfacto trained only with COLMAP pose there. We trained once time to evaluate both the two evaluation protocols there to improve efficiency, instead of training two times which was used in the previous paper before. The test ID used for evaluating the "test within a single sequence" is stored in "test.txt" in each "long_capture" folder, the remaining ID in the long sequence is used for training. We use the same model to evaluate the images in the short sequence. Mesh extracted from this model is used for evaluating the mesh reconstruction ability. Please follow this training and comparsion method reported here for efficiency.

To use the MuSHRoom dataset with nerfstudio framework, please use the dataparser here: dn_splatter/data/mushroom_dataparser.py. The instructions are in https://github.com/maturk/dn-splatter#mushroom.

Device	Methods	Reconstruction quality					Rendering quality
		Reconstruction quality					Test within a single sequence			Test with a different sequence
		Acc ↓	Comp ↓	C-l1 ↓	NC ↑	F-score ↑	PSNR ↑	SSIM ↑	LPIPS ↓	PSNR ↑	SSIM ↑	LPIPS ↓
iPhone	Nerfacto	0.0652	0.0603	0.0628	0.7491	0.6390	20.83	0.7653	0.2506	20.36	0.7448	0.2781
	Depth-Nerfacto	0.0653	0.0614	0.0634	0.7354	0.6126	21.23	0.7623	0.2612	20.67	0.7423	0.2873
	MonoSDF	0.0792	0.0237	0.0514	0.8200	0.7596	19.79	0.6972	0.4122	17.92	0.6683	0.4384
	Splatfacto	0.1074	0.0708	0.0881	0.7602	0.4405	24.22	0.8375	0.1421	21.39	0.7738	0.1986
Kinect	Nerfacto	0.0669	0.0695	0.0682	0.7458	0.6252	23.89	0.8375	0.2048	22.43	0.8331	0.2010
	Depth-Nerfacto	0.0710	0.0691	0.0701	0.7274	0.5905	24.21	0.8370	0.2107	22.77	0.8345	0.2036
	MonoSDF	0.0439	0.0204	0.0321	0.8616	0.8753	23.05	0.8315	0.2434	21.60	0.8267	0.2219
	Splatfacto	0.1007	0.0704	0.0855	0.7689	0.4697	26.07	0.8844	0.1378	23.28	0.8604	0.1579

TUTvision / MuSHRoom

readme