Cameras and LiDARs are currently two types of sensors commonly used for 3D mapping. Vision-based methods are susceptible to textureless regions and lighting, and LiDAR-based methods easily degenerate in scenes with insignificant structural features. Most current fusion-based methods require strict synchronization between the camera and LiDAR and also need auxiliary sensors, such as IMU. All of these lead to an increase in device cost and complexity. To address that, in this paper, we propose a low-cost mapping pipeline called PanoVLM that only requires a panoramic camera and a LiDAR without strict synchronization. Firstly, camera poses are estimated by a LiDAR-assisted global Structure-from-Motion, and LiDAR poses are derived with the initial camera-LiDAR relative pose. Then, line-to-line and point-to-plane associations are established between LiDAR point clouds, which are used to further refine LiDAR poses and remove motion distortion. With the initial sensor poses, line-to-line correspondences are established between images and LiDARs to refine their poses jointly. The final step, joint panoramic Multi-View Stereo, estimates the depth map for each panoramic image and fuses them into a complete dense 3D map. Experimental results show that PanoVLM can work on various scenarios and outperforms state-of-the-art (SOTA) vision-based and LiDAR-based methods.
The pipeline of PanoVLM can be divided into three steps.
The following figure shows the pipeline of PanoVLM.
Most of the SfM codes are adopted from openMVG.
Most of the MVS codes are adopted from openMVS.
For the details of the codes, please refer to the Tutorial. We also provide a docker image for the codes.
Our data acquisition device contains the following componets:
The VLP-16 is connected to the Raspberry Pi 4B via Ethernet, and the camera is connected to the Raspberry Pi 4B via Wi-Fi. On the Raspberry Pi 4B, a small screen is used to display the timestamp of the LiDAR data, which is used to roughly synchronize the LiDAR and the camera.
The following figure shows the device.
As our device is very light and compact, it can be easily carried by a person or mounted on a car. The following figure shows the device mounted on a car.
As our device does not have hardware synchronization, we use the timestamp on the screen to roughly synchronize the panoramic camera and LiDAR. Each complete LiDAR scan (0.1s per scan) is sent in dozens of pcap packets, and each packet contains a timestamp, which is displayed on the screen. However, the amount of pcap packets is too large, so the screen is refreshed every 0.5s. To roughly synchronize the two types of data, from the beginning of the panoramic video, we manually choose a frame that the screen content is just refreshed, and we regard this frame and the corresponding pcap packet as synchronous. We also repeat the same steps at the end of the video to get roughly synchronized data. For the roughly synchronized data, we perform uniform sampling at equal intervals in the middle part, i.e., we sample images at 10 frames per second. For the pcap packets, we aggregate them every 100ms into a complete LiDAR data frame.
The following figures show the timestamp cptured by the camera and the schematic diagram of the roughly synchronization.
With our data acquisition devices, five different datasets are collected, including two indoor datasets and three outdoor datasets. The following table shows the details of the datasets.
Dataset | #Frames | Length(m) | GT pose | Environment | Download Link |
---|---|---|---|---|---|
Room | 454 | 14.22 | No | Indoor | link |
Floor | 1593 | 182.39 | No | Indoor | link |
Building | 1724 | 482.81 | No | Outdoor | |
Campus-Small | 824 | 216.80 | Yes | Outdoor | |
Campus-Large | 8730 | 3557.63 | Yes | Outdoor |
note: The origin datasets of Room and Floor are sampled at 10 frames per second. For indoor SfM, the sample rate is too high, so we downsample the datasets to reduce data.
We are currently sorting out datasets Building, Cmpus-Small and Campus-Large. The download link will be updated soon.
The dense mapping result in Building and Campus-Large datasets.
@article{TU2023149,
title = {PanoVLM: Low-Cost and accurate panoramic vision and LiDAR fused mapping},
author = {Diantao Tu and Hainan Cui and Shuhan Shen},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
volume = {206},
pages = {149-167},
year = {2023},
issn = {0924-2716},
doi = {https://doi.org/10.1016/j.isprsjprs.2023.11.012}
}