About the depth label of real images

Hi Tu,

Thanks for your interest in our work. Here are the key steps to obtain depth or other 3D labels for real images.

During data collection, we used a DJI drone equipped with a high-precision real-time kinematics (RTK) kit to capture images and then extracted the geotags of the captured photos, including longitude, latitude, and height. After this step, we have both camera extrinsic and intrinsic parameters for each real image. We aimed to acquire high-quality data by: 1) using RTK drones known for millimeter-level accuracy; 2) validating the geotags through photogrammetry reconstruction, such as bundle adjustment.

Next, we imported the locations into the Cesium environment along with 3D assets. Thanks to the high-quality 3D assets provided by swisstopo, we could extract WGS84 world coordinates per pixel, establishing pixel-wise correspondence to the real images. After this step, we obtain world coordinates through ray-tracing for each real image. We validate the accuracy of coordinate extraction by computing the reprojection error from 3D to 2D space. For detailed information about data quality control, please refer to Appendix A in our paper.

Finally, we can extract depth based on the pinhole camera model, given the camera extrinsic and intrinsic parameters and world coordinates per pixel in the image. See here for the code.

Best, Qi

TOPO-EPFL / CrossLoc-Benchmark-Datasets

About the depth label of real images #4